osd: fix activate failure when block device moves (backport #14374) #14377
+4
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Block devices can move between reboots. In corner cases, an OSDs block device might move to a lower-indexed device while the previous device does not exist. For example, an OSD on /dev/sde might move to /dev/sdd on reboot if the original /dev/sdd died. There would be no /dev/sde after that.
Users report that NVMe drives commonly change names, even when there are no disk failures.
For these cases, ensure the activate script properly handles cases where the previous disk is not present on the node and where the OSD is still available on a different disk.
Resolves #13564
I tested this manually by editing one of my OSDs to use
/dev/vdf
in my environment with no/dev/vdf
present. Upon upgrade to the patched version, I see thatceph-volume
fails when the disk is not present with the same error, but theactivate
script is able to move ahead to continue successfully.I don't believe we have a good way of guaranteeing this code path gets tested in unit or CI tests, so the manual testing will have to do for now.
Because this was reported by a user upgrading to 1.13, we will plan to backport to 1.14 and 1.13.
Checklist:
This is an automatic backport of pull request #14374 done by [Mergify](https://mergify.com).