-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lw heartbeats during inflight appends #19974
Conversation
/dt |
/dt |
Failure unrelated: #19954 |
src/v/raft/consensus.h
Outdated
@@ -449,7 +449,7 @@ class consensus { | |||
}; | |||
|
|||
// precondition: is_elected_leader() must be true. | |||
suppress_heartbeats_guard suppress_heartbeats(vnode); | |||
inflight_appends_guard append_inflight(vnode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice to have a verb in the function name, something like "track_inflight_append", with the current name there is an impression that we are appending something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Simulates stuck append entries which supress heartbeats resulting in a leader step down.
Now that there is only one variant, code internal to the heatbeat manager is easy to follow without the _v2 suffix.
src/v/raft/consensus.h
Outdated
@@ -449,7 +453,7 @@ class consensus { | |||
}; | |||
|
|||
// precondition: is_elected_leader() must be true. | |||
suppress_heartbeats_guard suppress_heartbeats(vnode); | |||
track_inflight_appends_guard append_inflight(vnode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, what I meant was to add "track" to the function name, not the class name (so that function name has a verb in it) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmao 🤦 , let me fix it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more stamp please :-)
With the upcoming change, we don't necessarily suppress hearbeats if appends are in flight, so the current naming schema may confuse the readers. No logical changes.
If the inflight appends are stuck, we donot want the leader to mistakenly step down with majority heartbeat loss. Instead this commit switches to using lw heartbeats in that window.
Failure unrelated #20316 |
/backport v24.1.x |
Failed to create a backport PR to v24.1.x branch. I tried:
|
@@ -60,7 +60,6 @@ enum class feature : std::uint64_t { | |||
force_partition_reconfiguration = 1ULL << 26U, | |||
raft_append_entries_serde = 1ULL << 28U, | |||
delete_records = 1ULL << 29U, | |||
lightweight_heartbeats = 1ULL << 30U, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharathv is this really "dead" in the sense that this value could be contained in a controller log somewhere and need to be replayed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a retired flag as it is enabled by default in any cluster installing this version. More details here
https://github.com/redpanda-data/redpanda/blob/dev/src/v/features/feature_table.h#L88
If there are appends in flight and are stuck (eg: disk pressure on the follower), that may result in a spurious leadership step down due to heartbeat loss as the hbs are suppressed for the duration of the append. This commit switches to using lw heartbeats when there are inflight appends to avoid this scenario.
Backports Required
Release Notes
Bug Fixes