Hi,
I want to prepare a failed disk for replacement. I did:
ceph orch osd rm 35 --zap --replace
and it's now in the state "Done, waiting for purge", with 0 pgs, and
REPLACE and ZAP set to true. It's been like this for some hours, and now
my cluster is unhappy:
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon osd.35 on host moss-be1002 not managed by cephadm
(the OSD is down & out)
...and also neither the disk nor the relevant NVME LV has been zapped.
I have my OSDs deployed via a spec:
service_type: osd
service_id: rrd_single_NVMe
placement:
label: "NVMe"
spec:
data_devices:
rotational: 1
db_devices:
model: "NVMe"
And before issuing the ceph orch osd rm I set that to be unmanaged (ceph
orch set-unmanaged osd.rrd_single_NVMe), as obviously I don't want ceph
to just try and re-make a new OSD on the sad disk.
I'd expected from the docs[0] that what I did would leave me with a
system ready for the failed disk to be swapped (and that I could then
mark osd.rrd_single_NVMe as managed again, and a new OSD built),
including removing/wiping the NVME lv so it can be removed.
What did I do wrong? I don't much care about the OSD id (but obviously
it's neater to not just incrementally increase OSD numbers every time a
disk died), but I thought that telling ceph orch not to make new OSDs
then using ceph orch osd rm to zap the disk and NVME lv would have been
the way to go...
Thanks,
Matthew
[0] https://docs.ceph.com/en/reef/cephadm/services/osd/#replacing-an-osd
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx