Transfer leadership before stepping down after reconfiguration #19966

mmaslankaprv · 2024-06-24T12:36:26Z

When a node currently being a raft group leader is not a part of new
configuration it must step down and become a follower. When stepping
down a leader stops sending heartbeats to the followers allowing them to
trigger election. The election starts only after an election timeout
elapsed on on of the followers. This makes the whole process
slow and during the whole time clients can not write and read from the
raft group as it is leaderless. To address this issue a new method of
step down was introduced. The new stepdown implementation which is
going to be used for reconfiguration requests one of the followers to
timeout immediately and trigger leader election. This speeds up the
whole process and makes it much less disruptive as the stepdown is now
comparable to leadership transfer.

Time to elect a leader before the fix:

INFO  2024-06-24 11:19:34,717 [shard  0:main] raft_test - basic_raft_fixture_test.cc:782 - leadership_transfer - new leader reported after: 74 ms

after reconfiguration:

INFO  2024-06-24 11:19:36,607 [shard  0:main] raft_test - basic_raft_fixture_test.cc:815 - reconfiguration - new leader reported after: 1690 ms

with the fix:

INFO  2024-06-24 12:14:45,170 [shard  0:main] raft_test - basic_raft_fixture_test.cc:817 - reconfiguration - new leader reported after: 66 ms

Backports Required

Release Notes

Improvements

Made leadership changes related with reconfiguration faster and less disruptive

src/v/raft/consensus.cc

dotnwat · 2024-06-25T19:08:41Z

When a node currently being a raft group leader is not a part of new
configuration it must step down and become a follower.

does this make sense? if a node is removed from a raft configuration, why would it be a follower?

bharathv · 2024-06-25T19:28:26Z

When a node currently being a raft group leader is not a part of new
configuration it must step down and become a follower.

does this make sense? if a node is removed from a raft configuration, why would it be a follower?

Michal was probably referring to the implementation detail there. The "becoming a follower" (see do_step_down("reason")) part in the implementation relinquishes leadership letting a new leader take charge. This is the terminal state for that replica as it cannot request votes (it is no longer a part of the configuration) nor it can receive any heartbeats as the rest of the quorum already forgot about it. It will be GC-ed by the controller.

Added a missing trigger of Raft leadership notification after stepping down when a leader node is not longer part of raft group configuration. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv · 2024-06-27T06:27:32Z

/ci-repeat 1

ztlpn · 2024-06-27T11:54:56Z

src/v/raft/consensus.cc

+        if (_leader_id) {
+            _leader_id = std::nullopt;
+            trigger_leadership_notification();
+        }


should this be part of do_step_down?

in some cases (when processing requests) we do not trigger the notification with no leader but immediately update leader with the new leader node id

src/v/raft/consensus.cc

src/v/raft/tests/basic_raft_fixture_test.cc

ztlpn · 2024-06-27T12:13:58Z

src/v/raft/tests/basic_raft_fixture_test.cc

+        co_await stop_node(vn.id());
+    }
+
+    auto tolerance = 0.15;


nit: I have a feeling that could be flaky in a noisy debug environment. In the end what we care about is that this interval is much smaller than the election timeout, maybe we can test that directly.

i was worried about that and this is why i expressed the expected value based on the leadership transfer that is executed right before the reconfiguration. I was thinking that in the debug environment the leadership transfer would also be slower hence the test will self adapt to the env

When a node currently being a raft group leader is not a part of new configuration it must step down and become a follower. When stepping down a leader stops sending heartbeats to the followers allowing them to trigger election. The election starts only after an election timeout elapsed on on of the followers. This makes the whole process slow and during the whole time clients can not write and read from the raft group as it is leaderless. To address this issue a new method of step down was introduced. The new stepdown implementation which is going to be used for reconfiguration requests one of the followers to timeout immediately and trigger leader election. This speeds up the whole process and makes it much less disruptive as the stepdown is now comparable to leadership transfer. Signed-off-by: Michał Maślanka <[email protected]>

Added a test validating if a leader election caused by removing leader from the replica set takes a comparable amount of time to the leadership transfer. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv · 2024-06-28T10:08:33Z

/ci-repeat 1

mmaslankaprv · 2024-06-28T14:53:54Z

ci failure: #19012

vbotbuildovich · 2024-06-28T14:54:14Z

/backport v24.1.x

vbotbuildovich · 2024-06-28T14:54:15Z

/backport v23.3.x

vbotbuildovich · 2024-06-28T14:55:17Z

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-19966-v24.1.x-655 remotes/upstream/v24.1.x
git cherry-pick -x 8f4db30afde5beed6c5ca8f2ac979c296972061c 9f7a5085d1d1161d7384acc76e2064f47f898a6a 9c581091ea38afcfa3461d2f1d1524b054ec8269

Workflow run logs.

vbotbuildovich · 2024-06-28T14:55:23Z

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-19966-v23.3.x-253 remotes/upstream/v23.3.x
git cherry-pick -x 8f4db30afde5beed6c5ca8f2ac979c296972061c 9f7a5085d1d1161d7384acc76e2064f47f898a6a 9c581091ea38afcfa3461d2f1d1524b054ec8269

Workflow run logs.

mmaslankaprv requested review from bharathv, ztlpn and bashtanov June 24, 2024 12:36

github-actions bot added the area/redpanda label Jun 24, 2024

bharathv previously approved these changes Jun 24, 2024

View reviewed changes

src/v/raft/consensus.cc Outdated Show resolved Hide resolved

src/v/raft/consensus.cc Outdated Show resolved Hide resolved

mmaslankaprv dismissed bharathv’s stale review via ff7751c June 25, 2024 07:29

mmaslankaprv force-pushed the transfer-step-down branch from 22917a8 to ff7751c Compare June 25, 2024 07:29

mmaslankaprv requested a review from bharathv June 25, 2024 07:29

bharathv previously approved these changes Jun 25, 2024

View reviewed changes

r/consensus: added missing notification trigger when stepping down

8f4db30

Added a missing trigger of Raft leadership notification after stepping down when a leader node is not longer part of raft group configuration. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv dismissed bharathv’s stale review via dfb0533 June 26, 2024 06:19

mmaslankaprv force-pushed the transfer-step-down branch from ff7751c to dfb0533 Compare June 26, 2024 06:19

ztlpn reviewed Jun 27, 2024

View reviewed changes

mmaslankaprv added 2 commits June 27, 2024 13:04

r/tests: added test validating time to elect a new leader

9c58109

Added a test validating if a leader election caused by removing leader from the replica set takes a comparable amount of time to the leadership transfer. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv force-pushed the transfer-step-down branch from dfb0533 to 9c58109 Compare June 27, 2024 13:23

ztlpn approved these changes Jun 27, 2024

View reviewed changes

mmaslankaprv merged commit 1aed2fd into redpanda-data:dev Jun 28, 2024
15 of 18 checks passed

mmaslankaprv deleted the transfer-step-down branch June 28, 2024 14:54

vbotbuildovich mentioned this pull request Jun 28, 2024

[v24.1.x] Transfer leadership before stepping down after reconfiguration #20750

Open

vbotbuildovich mentioned this pull request Jun 28, 2024

[v23.3.x] Transfer leadership before stepping down after reconfiguration #20751

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer leadership before stepping down after reconfiguration #19966

Transfer leadership before stepping down after reconfiguration #19966

mmaslankaprv commented Jun 24, 2024 •

edited

Loading

dotnwat commented Jun 25, 2024

bharathv commented Jun 25, 2024

mmaslankaprv commented Jun 27, 2024

ztlpn Jun 27, 2024

mmaslankaprv Jun 27, 2024

ztlpn Jun 27, 2024

mmaslankaprv Jun 27, 2024

mmaslankaprv commented Jun 28, 2024

mmaslankaprv commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

Transfer leadership before stepping down after reconfiguration #19966

Transfer leadership before stepping down after reconfiguration #19966

Conversation

mmaslankaprv commented Jun 24, 2024 • edited Loading

Backports Required

Release Notes

Improvements

dotnwat commented Jun 25, 2024

bharathv commented Jun 25, 2024

mmaslankaprv commented Jun 27, 2024

ztlpn Jun 27, 2024

Choose a reason for hiding this comment

mmaslankaprv Jun 27, 2024

Choose a reason for hiding this comment

ztlpn Jun 27, 2024

Choose a reason for hiding this comment

mmaslankaprv Jun 27, 2024

Choose a reason for hiding this comment

mmaslankaprv commented Jun 28, 2024

mmaslankaprv commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

vbotbuildovich commented Jun 28, 2024

mmaslankaprv commented Jun 24, 2024 •

edited

Loading