Re: ceph orch osd rm --zap --replace leaves cluster in odd state

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Tue, 28 May 2024 12:07:45 -0400

What is the state of your PGs? could you post "ceph -s"

I believe (but a bit of an assumption after encountering something similar
myself) that under the hood cephadm is using the "ceph osd safe-to-destroy
osd.X" command and when OSD.X is no longer running and not all PGs are
active+clean (for instance in a active+remapped state) the safe-to-destroy
command will return in the negative with the response "OSD.X not reporting
stats, not all PGs are active+clean, cannot draw any conclusions" or some
such msg. The cephadm osd removal would stall in that state until all PGs
reach active+clean.

Respectfully,

*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
wes@xxxxxxxxxxxxxxxxx

On Tue, May 28, 2024 at 11:43 AM Matthew Vernon <mvernon@xxxxxxxxxxxxx>
wrote:

> Hi,
>
> I want to prepare a failed disk for replacement. I did:
> ceph orch osd rm 35 --zap --replace
>
> and it's now in the state "Done, waiting for purge", with 0 pgs, and
> REPLACE and ZAP set to true. It's been like this for some hours, and now
> my cluster is unhappy:
>
> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
>      stray daemon osd.35 on host moss-be1002 not managed by cephadm
>
> (the OSD is down & out)
>
> ...and also neither the disk nor the relevant NVME LV has been zapped.
>
> I have my OSDs deployed via a spec:
> service_type: osd
> service_id: rrd_single_NVMe
> placement:
>    label: "NVMe"
> spec:
>    data_devices:
>      rotational: 1
>    db_devices:
>      model: "NVMe"
>
> And before issuing the ceph orch osd rm I set that to be unmanaged (ceph
> orch set-unmanaged osd.rrd_single_NVMe), as obviously I don't want ceph
> to just try and re-make a new OSD on the sad disk.
>
> I'd expected from the docs[0] that what I did would leave me with a
> system ready for the failed disk to be swapped (and that I could then
> mark osd.rrd_single_NVMe as managed again, and a new OSD built),
> including removing/wiping the NVME lv so it can be removed.
>
> What did I do wrong? I don't much care about the OSD id (but obviously
> it's neater to not just incrementally increase OSD numbers every time a
> disk died), but I thought that telling ceph orch not to make new OSDs
> then using ceph orch osd rm to zap the disk and NVME lv would have been
> the way to go...
>
> Thanks,
>
> Matthew
>
> [0] https://docs.ceph.com/en/reef/cephadm/services/osd/#replacing-an-osd
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx