Dear Ceph users,I'm upgrading some disks of my cluster (Squid 19.2.0 managed by cephadm, in which basically I have only a 6+2 EC pool over 12 hosts). To speed up the operations I issued a ceph orch osd rm --replace for two OSDs in two different hosts; the drain started for both and for one OSD finished smoothly and it is now in destroyed state. But for the second OSD it stopped with a single PG remaining to be moved away before the OSD is completely drained:
# ceph orch osd rm statusOSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT 31 rokanan draining 1 True False False 2024-12-19 08:57:36.458704+00:00
and there is no backfill activity going on, even i f the PG is labeled as backfilling :
# ceph -s cluster: id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45 health: HEALTH_WARN 52 pgs not deep-scrubbed in time (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT) services: mon: 5 daemons, quorum bofur,fili,aka,bifur,romolo (age 7d)mgr: fili.olevnm(active, since 18h), standbys: bofur.tklnrn, bifur.htimkf
mds: 2/2 daemons up, 1 standby osd: 124 osds: 123 up (since 4h), 122 in (since 22h); 1 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 529 pgs objects: 27.11M objects, 78 TiB usage: 104 TiB used, 162 TiB / 266 TiB avail pgs: 53120/216457202 objects misplaced (0.025%) 302 active+clean 178 active+clean+scrubbing 48 active+clean+scrubbing+deep 1 active+remapped+backfillingIs all of the above normal? I guessed that maybe only one destroyed OSD at once can exist in the cluster, and that after replacing its disk and recreating it the drain for the second one would resume and finish, is this plausible?
Thanks, Nicola
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx