Hi Nicola, There have been reports of similar pg/osd behaviour with a single pg remaining when draining. You can try repeering (or even moving with upmap) the pg to a different OSD or restarting the primary OSD of the pg in question to get things moving. Also check on the progress if its moving at all - easiest with jj balancer (showremapped). Although some might find it risky but we usually trust the failure domain (disk/host/rack) and simply unplug the OSD when changing. Saves us time. Best, Laimis J. On Fri, Dec 20, 2024, 09:51 Nicola Mori <mori@xxxxxxxxxx> wrote: > Dear Ceph users, > > I'm upgrading some disks of my cluster (Squid 19.2.0 managed by cephadm, > in which basically I have only a 6+2 EC pool over 12 hosts). To speed up > the operations I issued a ceph orch osd rm --replace for two OSDs in two > different hosts; the drain started for both and for one OSD finished > smoothly and it is now in destroyed state. But for the second OSD it > stopped with a single PG remaining to be moved away before the OSD is > completely drained: > > # ceph orch osd rm status > OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT > > 31 rokanan draining 1 True False False 2024-12-19 > 08:57:36.458704+00:00 > > and there is no backfill activity going on, even i f the PG is labeled > as backfilling : > > # ceph -s > cluster: > id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45 > health: HEALTH_WARN > 52 pgs not deep-scrubbed in time > (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT) > > services: > mon: 5 daemons, quorum bofur,fili,aka,bifur,romolo (age 7d) > mgr: fili.olevnm(active, since 18h), standbys: bofur.tklnrn, > bifur.htimkf > mds: 2/2 daemons up, 1 standby > osd: 124 osds: 123 up (since 4h), 122 in (since 22h); 1 remapped pgs > > data: > volumes: 1/1 healthy > pools: 3 pools, 529 pgs > objects: 27.11M objects, 78 TiB > usage: 104 TiB used, 162 TiB / 266 TiB avail > pgs: 53120/216457202 objects misplaced (0.025%) > 302 active+clean > 178 active+clean+scrubbing > 48 active+clean+scrubbing+deep > 1 active+remapped+backfilling > > > Is all of the above normal? I guessed that maybe only one destroyed OSD > at once can exist in the cluster, and that after replacing its disk and > recreating it the drain for the second one would resume and finish, is > this plausible? > Thanks, > > Nicola > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx