I wanted to swap out on existing OSD, preserve the number, and then remove the HDD that had it (osd.14 in this case) and give the ID of 14 to a new SSD that would be taking its place in the same node. First time ever doing this, so not sure what to expect.
I followed the instructions here,
using the --replace flag.
However, I'm a bit concerned that the operation is taking so long in my test cluster. Out of 70TB in the cluster, only 40GB were in use. This is a relatively large OSD in comparison to others in the cluster (2.7TB versus 300GB for most other OSDs) and yet it's been 36 hours with the following status:
ceph04.ssc.wisc.edu> ceph orch osd rm status OSD_ID HOST STATE PG_COUNT REPLACE FORCE DRAIN_STARTED_AT 14 ceph04.ssc.wisc.edu draining 1 True True 2021-11-30 15:22:23.469150+00:00
Another note: I don't know why it has the "force = true" set; the
command that I ran was just Ceph
orch osd rm 14 --replace, without specifying --force. Hopefully
not a big deal but still strange.
At this point is there any way to tell if it's still actually
doing something, or perhaps it is hung? if it is hung, what would
be the 'recommended' way to proceed? I know that I could just
manually eject the HDD from the chassis and run the "ceph osd
crush remove osd.14" command and then manually delete the auth
keys, etc, but the documentation seems to state that this
shouldn't be necessary if a ceph OSD replacement goes properly.
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx