storage: fallocate at least one chunk to avoid appender hangs #20131

nvartolomei · 2024-06-25T11:57:35Z

Backports Required

Release Notes

none

In the `segment_appender::do_append` method we call `do_next_adaptive_fallocation` repeatedly until there is enough space in the file for the incoming operation. If `segment_fallocation_step` is set to 0 then this would result in an infinite loop. The code comments hint that there is another case when the fallocation step is 0, when the disk is full. However, `storage_resources::calc_falloc_step` clamps the minimum to append chunk size too: ``` // At the minimum, falloc one chunk's worth of space. if (step < min_falloc_step) { // If we have less than the minimum step, don't both falloc'ing at all. step = _append_chunk_size; } ```

nvartolomei · 2024-06-25T12:41:31Z

a potential alternative is to return the fallocated byte count and if 0 avoid looping at

redpanda/src/v/storage/segment_appender.cc

Line 164 in f007c20

continue;

dotnwat · 2024-06-26T04:17:13Z

src/v/storage/segment_appender.cc

+    vassert(
+      step != 0, "falloc step must be non-zero for appender to make progress");


should this be a post-condition on get_falloc_step?

dotnwat · 2024-06-26T04:19:06Z

src/v/storage/segment_appender.cc

-        // Don't fallocate.  This happens if we're low on disk, or if
-        // the user has configured a 0 max falloc step.


ahh yeh this was probably present prior to adaptive falloc step work

dotnwat · 2024-06-26T04:26:17Z

src/v/storage/storage_resources.cc

-    return step;
+
+    // Allocate at least one chunk's worth of space.
+    return std::max(step, _append_chunk_size);


should we look to see if there are any clusters out there that have a configured chunk size would be too large? i see the max in the configuration.cc is 32mb, but that seems rediculous. but even smaller values like 1mb, that are a lot larger than what i ewould expect in practice (like 32kb) could make this dynamic fallocation optimization fail for cases where we have huge numbers of partitions?

github-actions bot added the area/redpanda label Jun 25, 2024

nvartolomei force-pushed the nv/falloc-hang branch from 54c9601 to 4f433d7 Compare June 25, 2024 12:11

nvartolomei marked this pull request as ready for review June 25, 2024 12:12

nvartolomei changed the title ~~RFD: storage: test 0 falloc step~~ storage: test 0 falloc step Jun 25, 2024

nvartolomei changed the title ~~storage: test 0 falloc step~~ storage: fallocate at least one chunk to avoid appender hangs Jun 25, 2024

dotnwat reviewed Jun 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: fallocate at least one chunk to avoid appender hangs #20131

storage: fallocate at least one chunk to avoid appender hangs #20131

nvartolomei commented Jun 25, 2024 •

edited

Loading

nvartolomei commented Jun 25, 2024

dotnwat Jun 26, 2024

dotnwat Jun 26, 2024

dotnwat Jun 26, 2024

		vassert(
		step != 0, "falloc step must be non-zero for appender to make progress");

		// Don't fallocate. This happens if we're low on disk, or if
		// the user has configured a 0 max falloc step.

storage: fallocate at least one chunk to avoid appender hangs #20131

Are you sure you want to change the base?

storage: fallocate at least one chunk to avoid appender hangs #20131

Conversation

nvartolomei commented Jun 25, 2024 • edited Loading

Backports Required

Release Notes

nvartolomei commented Jun 25, 2024

dotnwat Jun 26, 2024

Choose a reason for hiding this comment

dotnwat Jun 26, 2024

Choose a reason for hiding this comment

dotnwat Jun 26, 2024

Choose a reason for hiding this comment

nvartolomei commented Jun 25, 2024 •

edited

Loading