Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fallocate at least one chunk to avoid appender hangs #20131

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Jun 25, 2024

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

In the `segment_appender::do_append` method we call
`do_next_adaptive_fallocation` repeatedly until there is enough space in
the file for the incoming operation. If `segment_fallocation_step` is
set to 0 then this would result in an infinite loop.

The code comments hint that there is another case when the fallocation
step is 0, when the disk is full. However,
`storage_resources::calc_falloc_step` clamps the minimum to append chunk
size too:

```
// At the minimum, falloc one chunk's worth of space.
if (step < min_falloc_step) {
    // If we have less than the minimum step, don't both falloc'ing at all.
    step = _append_chunk_size;
}
```
@nvartolomei nvartolomei marked this pull request as ready for review June 25, 2024 12:12
@nvartolomei nvartolomei changed the title RFD: storage: test 0 falloc step storage: test 0 falloc step Jun 25, 2024
@nvartolomei nvartolomei changed the title storage: test 0 falloc step storage: fallocate at least one chunk to avoid appender hangs Jun 25, 2024
@nvartolomei
Copy link
Contributor Author

a potential alternative is to return the fallocated byte count and if 0 avoid looping at

Comment on lines +334 to +335
vassert(
step != 0, "falloc step must be non-zero for appender to make progress");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a post-condition on get_falloc_step?

Comment on lines -335 to -336
// Don't fallocate. This happens if we're low on disk, or if
// the user has configured a 0 max falloc step.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh yeh this was probably present prior to adaptive falloc step work

return step;

// Allocate at least one chunk's worth of space.
return std::max(step, _append_chunk_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we look to see if there are any clusters out there that have a configured chunk size would be too large? i see the max in the configuration.cc is 32mb, but that seems rediculous. but even smaller values like 1mb, that are a lot larger than what i ewould expect in practice (like 32kb) could make this dynamic fallocation optimization fail for cases where we have huge numbers of partitions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants