Re: ceph octopus mysterious OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/19/21 6:22 PM, Philip Brown wrote:
I made *some* progress for cleanup.
I could already do "ceph osd rm 33" from my master. But doing the cleanup on the actual OSD node was problematical.

ceph-volume lvm zap xxx

wasnt working properly.. because the device wasnt fully released.... because at the regular OS level, it cant even SEE the VGs??
That caught me by surprise.
But doing   cephadm shell   let me see the vgs, remove it,  and thus have the zap work.

so now we move on to reconstructing the hybrid OSD.
First off, by default, the cephadm shell did not have permission to create OSDs. so I had to do
[ceph: root@dxxxx /]#  ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

Unfortunately, since I had run the lvm zap on both the data /dev/sdX, AND the db lv partition.. attempting to recreate
the OSD with
ceph-volume lvm prepare --data  /dev/sdb --block.db /dev/ceph-xx-xx-xx/osd-db-xxxx

(the original db lvm on SSD, which still technically existed)

FAILED, because
   -->   blkid could not detect a PARTUUID for device: /dev/ceph-xxxx/osd-xxx
   --> Was unable to complete a new OSD, will rollback changes


cmon.... just MAKE one for me???

:-(

Happily, i could grep for  osd-db-specific-id-here in /var/log/ceph/ceph-volume.log  and found the exact original lvcreate syntax to remake it.
BUT....
lvm prepare once again complained about not detecting a PARTUUID.
I think there may be a command to do that, that is left out of the ceph-volume.log   :(


So.. now what can I do?

Not to try to create OSDs by hand. That might be an old way to do it, but with containers this is just a PITY. So, my best guess is that you put the OSD in a state that cephadm expects it to be to be:

https://docs.ceph.com/en/latest/cephadm/osd/#deploy-osds

Make sure all the conditions are met:



    The device must have no partitions.

    The device must not have any LVM state.

    The device must not be mounted.

    The device must not contain a file system.

    The device must not contain a Ceph BlueStore OSD.

    The device must be larger than 5 GB.

So make sure everything is destroyed from this device.

ceph orch daemon add osd hostname:/dev/device

or

ceph orch apply osd --all-available-devices

I guess that should be the same device (no other unused disks).


And let cephadm just make it happen ... wishfull thinking at this point ;-).

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux