You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- CubeFS version: v3.3.1 w/Authnode patches
- Deployment mode(docker or standalone or cluster): multiple bare metal nodes
- Dependent components: All
- OS kernel version(Ubuntu or CentOS): Debian 12
- CPU/Memory: At least 6 cores, 12GiB RAM per node
- Others:
Current Behavior
After using the cluster for a while, with datanodes coming and going, I have a bunch of unavailable replicas and other issues:
wings:~/ $ cfs-cli datapartition check [11:53:07]
[Inactive Data nodes]:
ID ZONE ADDRESS USED TOTAL STATUS REPORT TIME
[Corrupt data partitions](no leader):
ID VOLUME REPLICAS STATUS MEMBERS
[Partition lack replicas]:
ID VOLUME REPLICAS STATUS MEMBERS
[Bad data partitions(decommission not completed)]:
PATH PARTITION ID
[Partition has unavailable replica]:
DP_ID VOLUME REPLICAS DP_STATUS MEMBERS UNAVAILABLE_REPLICAS
615 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.16:17310]
616 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.16:17310]
617 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.16:17310]
618 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.16:17310]
619 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.16:17310]
620 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.16:17310]
621 pbs.per 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.16:17310]
622 pbs.per 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.16:17310]
623 pbs.per 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.16:17310]
624 pbs.per 3 Read only [10.0.20.14:17310, 10.0.20.15:17310, 10.0.20.16:17310] [10.0.20.16:17310]
625 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.16:17310]
626 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.16:17310]
627 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.16:17310]
628 media 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.16:17310]
629 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.16:17310]
630 media 3 Read only [10.0.20.14:17310, 10.0.20.15:17310, 10.0.20.16:17310] [10.0.20.16:17310]
633 media 3 Read only [10.0.20.14:17310, 10.0.20.15:17310, 10.0.20.16:17310] [10.0.20.15:17310]
634 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
635 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.15:17310]
636 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
637 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.15:17310]
638 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
639 media 3 Read only [10.0.20.14:17310, 10.0.20.15:17310, 10.0.20.16:17310] [10.0.20.15:17310]
640 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
641 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
642 media 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.15:17310]
643 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
644 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
645 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
646 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.15:17310]
647 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.15:17310]
648 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
649 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.15:17310]
650 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
651 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
652 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
653 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
654 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
655 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
656 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
657 media 3 Read only [10.0.20.14:17310, 10.0.20.15:17310, 10.0.20.16:17310] [10.0.20.15:17310]
658 media 3 Read only [10.0.20.14:17310, 10.0.20.16:17310, 10.0.20.15:17310] [10.0.20.15:17310]
659 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
660 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
661 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
662 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
663 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
664 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
665 media 3 Read only [10.0.20.15:17310, 10.0.20.16:17310, 10.0.20.14:17310] [10.0.20.15:17310]
666 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
667 media 3 Read only [10.0.20.15:17310, 10.0.20.14:17310, 10.0.20.16:17310] [10.0.20.15:17310]
668 media 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.15:17310]
669 media 3 Read only [10.0.20.16:17310, 10.0.20.15:17310, 10.0.20.14:17310] [10.0.20.15:17310]
670 media 3 Read only [10.0.20.16:17310, 10.0.20.14:17310, 10.0.20.15:17310] [10.0.20.15:17310]
[Partition with replica file count differ significantly]:
DP_ID VOLUME REPLICAS DP_STATUS MEMBERS(fileCount)
113 sia.per 3 Writable [10.0.20.14:17310(546),10.0.20.18:17310(544),10.0.20.15:17310(546 isLeader)]
121 sia.per 3 Writable [10.0.20.15:17310(506 isLeader),10.0.20.18:17310(506),10.0.20.14:17310(506)]
122 sia.per 3 Writable [10.0.20.15:17310(505 isLeader),10.0.20.18:17310(503),10.0.20.14:17310(505)]
125 sia.per 3 Writable [10.0.20.15:17310(517 isLeader),10.0.20.18:17310(516),10.0.20.14:17310(517)]
133 sia.per 3 Writable [10.0.20.14:17310(466),10.0.20.18:17310(465),10.0.20.15:17310(466 isLeader)]
134 sia.per 3 Writable [10.0.20.14:17310(476),10.0.20.15:17310(476 isLeader),10.0.20.18:17310(475)]
136 sia.per 3 Writable [10.0.20.14:17310(430),10.0.20.18:17310(429),10.0.20.15:17310(430 isLeader)]
141 sia.per 3 Writable [10.0.20.16:17310(349 isLeader),10.0.20.14:17310(349),10.0.20.18:17310(348)]
157 sia.per 3 Writable [10.0.20.15:17310(491 isLeader),10.0.20.18:17310(490),10.0.20.14:17310(491)]
161 sia.per 3 Writable [10.0.20.16:17310(378 isLeader),10.0.20.18:17310(376),10.0.20.14:17310(378)]
162 sia.per 3 Writable [10.0.20.14:17310(473),10.0.20.18:17310(472),10.0.20.15:17310(473 isLeader)]
164 sia.per 3 Writable [10.0.20.15:17310(454 isLeader),10.0.20.18:17310(451),10.0.20.14:17310(454)]
174 sia.per 3 Writable [10.0.20.14:17310(537),10.0.20.18:17310(536),10.0.20.15:17310(537 isLeader)]
180 sia.per 3 Writable [10.0.20.15:17310(530 isLeader),10.0.20.14:17310(530),10.0.20.18:17310(529)]
181 sia.per 3 Writable [10.0.20.15:17310(499 isLeader),10.0.20.14:17310(499),10.0.20.18:17310(498)]
184 sia.per 3 Writable [10.0.20.15:17310(551 isLeader),10.0.20.14:17310(551),10.0.20.18:17310(549)]
187 sia.per 3 Writable [10.0.20.15:17310(514 isLeader),10.0.20.18:17310(513),10.0.20.14:17310(514)]
188 sia.per 3 Writable [10.0.20.15:17310(480),10.0.20.14:17310(479),10.0.20.16:17310(480 isLeader)]
189 sia.per 3 Writable [10.0.20.15:17310(500),10.0.20.18:17310(500 isLeader),10.0.20.14:17310(499)]
192 sia.per 3 Writable [10.0.20.15:17310(540 isLeader),10.0.20.14:17310(540),10.0.20.18:17310(539)]
194 sia.per 3 Writable [10.0.20.16:17310(407 isLeader),10.0.20.18:17310(406),10.0.20.14:17310(406)]
199 sia.per 3 Writable [10.0.20.15:17310(508 isLeader),10.0.20.18:17310(508),10.0.20.14:17310(507)]
201 sia.per 3 Writable [10.0.20.14:17310(500),10.0.20.15:17310(500 isLeader),10.0.20.18:17310(499)]
203 sia.per 3 Writable [10.0.20.18:17310(372),10.0.20.16:17310(373 isLeader),10.0.20.14:17310(372)]
204 sia.per 3 Writable [10.0.20.15:17310(471 isLeader),10.0.20.18:17310(470),10.0.20.14:17310(470)]
210 sia.per 3 Writable [10.0.20.14:17310(443),10.0.20.18:17310(441),10.0.20.15:17310(443 isLeader)]
211 sia.per 3 Writable [10.0.20.14:17310(506),10.0.20.15:17310(506 isLeader),10.0.20.18:17310(505)]
272 sia.per 3 Writable [10.0.20.16:17310(389 isLeader),10.0.20.18:17310(388),10.0.20.15:17310(389)]
276 sia.per 3 Writable [10.0.20.15:17310(371),10.0.20.18:17310(370),10.0.20.16:17310(371 isLeader)]
277 sia.per 3 Writable [10.0.20.15:17310(381),10.0.20.16:17310(381 isLeader),10.0.20.18:17310(380)]
278 sia.per 3 Writable [10.0.20.16:17310(445 isLeader),10.0.20.18:17310(444),10.0.20.15:17310(445)]
279 sia.per 3 Writable [10.0.20.15:17310(344),10.0.20.16:17310(344 isLeader),10.0.20.18:17310(341)]
283 sia.per 3 Writable [10.0.20.15:17310(353),10.0.20.18:17310(351),10.0.20.16:17310(353 isLeader)]
284 sia.per 3 Writable [10.0.20.16:17310(334 isLeader),10.0.20.18:17310(332),10.0.20.15:17310(334)]
286 sia.per 3 Writable [10.0.20.15:17310(369),10.0.20.18:17310(368),10.0.20.16:17310(369 isLeader)]
290 sia.per 3 Writable [10.0.20.15:17310(381),10.0.20.16:17310(381 isLeader),10.0.20.18:17310(379)]
294 sia.per 3 Writable [10.0.20.15:17310(372),10.0.20.16:17310(372 isLeader),10.0.20.18:17310(370)]
297 sia.per 3 Writable [10.0.20.15:17310(331),10.0.20.16:17310(331 isLeader),10.0.20.18:17310(330)]
304 sia.per 3 Writable [10.0.20.15:17310(436),10.0.20.18:17310(435),10.0.20.16:17310(436 isLeader)]
305 sia.per 3 Writable [10.0.20.15:17310(396),10.0.20.18:17310(396),10.0.20.16:17310(396 isLeader)]
306 sia.per 3 Writable [10.0.20.15:17310(421),10.0.20.18:17310(419),10.0.20.16:17310(421 isLeader)]
307 sia.per 3 Writable [10.0.20.15:17310(411),10.0.20.16:17310(411 isLeader),10.0.20.18:17310(409)]
308 sia.per 3 Writable [10.0.20.15:17310(413),10.0.20.18:17310(411),10.0.20.16:17310(413 isLeader)]
309 sia.per 3 Writable [10.0.20.15:17310(406),10.0.20.18:17310(405),10.0.20.16:17310(406 isLeader)]
321 sia.per 3 Writable [10.0.20.16:17310(428 isLeader),10.0.20.18:17310(426),10.0.20.15:17310(428)]
322 sia.per 3 Writable [10.0.20.16:17310(436 isLeader),10.0.20.18:17310(434),10.0.20.15:17310(436)]
331 sia.per 3 Writable [10.0.20.15:17310(344),10.0.20.16:17310(344 isLeader),10.0.20.18:17310(343)]
335 sia.per 3 Writable [10.0.20.16:17310(332 isLeader),10.0.20.18:17310(331),10.0.20.15:17310(332)]
336 sia.per 3 Writable [10.0.20.16:17310(345 isLeader),10.0.20.15:17310(345),10.0.20.18:17310(343)]
340 sia.per 3 Writable [10.0.20.16:17310(422 isLeader),10.0.20.18:17310(420),10.0.20.15:17310(422)]
345 sia.per 3 Writable [10.0.20.15:17310(362),10.0.20.18:17310(361),10.0.20.16:17310(362 isLeader)]
348 sia.per 3 Writable [10.0.20.15:17310(374),10.0.20.18:17310(373),10.0.20.16:17310(374 isLeader)]
351 sia.per 3 Writable [10.0.20.15:17310(400),10.0.20.18:17310(399),10.0.20.16:17310(400 isLeader)]
352 sia.per 3 Writable [10.0.20.15:17310(356),10.0.20.18:17310(354),10.0.20.16:17310(356 isLeader)]
354 sia.per 3 Writable [10.0.20.15:17310(381),10.0.20.18:17310(380),10.0.20.16:17310(381 isLeader)]
355 sia.per 3 Writable [10.0.20.15:17310(348),10.0.20.18:17310(346),10.0.20.16:17310(348 isLeader)]
357 sia.per 3 Writable [10.0.20.15:17310(402),10.0.20.18:17310(400),10.0.20.16:17310(402 isLeader)]
359 sia.per 3 Writable [10.0.20.15:17310(413),10.0.20.18:17310(413),10.0.20.16:17310(413 isLeader)]
361 sia.per 3 Writable [10.0.20.15:17310(423),10.0.20.16:17310(423 isLeader),10.0.20.18:17310(422)]
362 sia.per 3 Writable [10.0.20.15:17310(380),10.0.20.18:17310(379),10.0.20.16:17310(380 isLeader)]
365 sia.per 3 Writable [10.0.20.15:17310(369),10.0.20.18:17310(367),10.0.20.16:17310(369 isLeader)]
488 pbs.per 3 Writable [10.0.20.15:17310(160 isLeader),10.0.20.18:17310(158),10.0.20.14:17310(160)]
547 sia.per 3 Writable [10.0.20.16:17310(255 isLeader),10.0.20.14:17310(255),10.0.20.18:17310(254)]
554 sia.per 3 Writable [10.0.20.14:17310(255),10.0.20.18:17310(253),10.0.20.16:17310(255 isLeader)]
563 sia.per 3 Writable [10.0.20.14:17310(276),10.0.20.18:17310(274),10.0.20.16:17310(276 isLeader)]
[Partition with replica used size differ significantly]:
DP_ID VOLUME REPLICAS DP_STATUS MEMBERS(usedSize)
[Partition with excessive replicas]:
ID VOLUME REPLICAS STATUS MEMBERS
I was wondering what the fastest way to clean these up is and why CubeFS doesn't seem to clean this up naturally
Expected Behavior
CubeFS should manage recovery and rebalancing automatically
Steps To Reproduce
Not sure, sorry
CubeFS Log
No response
Anything else? (Additional Context)
No response
The text was updated successfully, but these errors were encountered:
The current system does not automatically handle these abnormal partitions. You need to manually take them offline for processing. Please refer to the user documentation for instructions, eg: cfs-cli datapartition decommission xxx xx
Contact Details
[email protected]
Is there an existing issue for this?
Priority
low (Default)
Environment
Current Behavior
After using the cluster for a while, with datanodes coming and going, I have a bunch of unavailable replicas and other issues:
I was wondering what the fastest way to clean these up is and why CubeFS doesn't seem to clean this up naturally
Expected Behavior
CubeFS should manage recovery and rebalancing automatically
Steps To Reproduce
CubeFS Log
No response
Anything else? (Additional Context)
No response
The text was updated successfully, but these errors were encountered: