Re: Reef - what happened to OSD spec?

Eugen Block <eblock@xxxxxx> · Thu, 31 Aug 2023 09:59:39 +0000

Just to clarify, I realized that my statement regarding the zapping of  
LVs for db/wal ist likely misleading. What I meant was that I don't  
see an option in the dashboard for that, but if you remove an osd in  
the command line the respective LVs are cleaned up:

$ ceph orch osd rm 0 --zap --force

will remove the osd as well as the corresponding db/wal device. Sorry  
for any confusion.

Zitat von Eugen Block <eblock@xxxxxx>:

Hi, just a few days ago I replied to a thread [2] with some  
explanations for destroy, delete and purge.

So if you "destroy" an OSD it is meant to be replaced, reusing the  
ID. A failed drive may not be responsive at all so an automated wipe  
might fail as well. If the db/wal is located on a different device  
you'll have to clean that up manually. I don't think cephadm is able  
to do that for you (yet). I just checked with a virtual Reef cluster.
So in a real world scenario if an OSD fails, you destroy it (which  
marks it as "destroyed" in the crush tree), then you wipe the LV  
containing db/wal manually. Unfortunately, 'ceph orch device zap'  
can't deal with VG/LV syntax:

$ ceph orch device zap ceph01  
/dev/ceph-0a339b77-072a-4a03-92d2-a2eda00bd12c/osd-db-5e64f473-07a3-432b-96e4-643e3bdbf2c0  
--force
Error EINVAL: Device path  
'/dev/ceph-0a339b77-072a-4a03-92d2-a2eda00bd12c/osd-db-5e64f473-07a3-432b-96e4-643e3bdbf2c0' not found on host  
'ceph01'

So you'll have to wipe it locally with ceph-volume:

cephadm ceph-volume lvm zap --destroy /dev/ceph-{VG}/osd-db-{LV}

When the failed drive has been replaced, cephadm will redeploy the  
OSD(s). You might want to pause the orchestrator (ceph orch pause)  
before wiping the drive(s) as it might deploy OSDs if multiple specs  
apply to the configuration.

[2]  
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/74FHIF73PTIGEC2T7JBPZ2BUIEMNNMEF/

Zitat von Nigel Williams <nigel.williams@xxxxxxxxxxx>:

Thanks Eugen for following up. Sorry my second response was incomplete. I
can confirm that it works as expected too. My confusion was that the
section from the online documentation seemed to be missing/moved, and when
it initially failed I wrongly thought that the OSD-add process had changed
in the Reef release.

There might still need to be a way that "destroy" does additional clean-up
to clear remnants of LVM fingerprints on the devices as this tripped me up
when the OSDspec apply failed due to "filesystem on device" checks.

Documentation has been improved and OSD spec is now under this heading for
Reef:

https://docs.ceph.com/en/reef/cephadm/services/osd/#advanced-osd-service-specifications
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx