Re: OSD not created after replacing failed disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just in case somebody finds themselves in the same boat:

The best I can tell, automatic OSD creation just doesn't work if you use separate DB/WAL devices.

I ended up creating DB/WAL volume manually (not sure it's necessary) and then running this command:

ceph orch daemon add osd HOST:data_devices=/dev/sdX,db_devices=VG/LV

Vlad

On 7/8/22 13:57, Vladimir Brik wrote:
I think I found the problem in cephadm.log:

DEBUG ... cephadm ['--image', ... 'ceph-volume', '--fsid', ..., '--config-json', '-', '--', 'lvm', 'batch', '--no-auto', '/dev/sdv', '--db-devices', '/dev/nvme0n1', '--osd-ids', '494', '--yes', '--no-systemd', '--report', '--format', 'json']
...
DEBUG /bin/podman: --> passed data devices: 1 physical, 0 LVM
DEBUG /bin/podman: --> relative data size: 1.0
DEBUG /bin/podman: --> passed block_db devices: 1 physical, 0 LVM DEBUG /bin/podman: --> 1 fast devices were passed, but none are available

So it looks like cephadm ceph-volume is unhappy because /dev/nvme0n1 is not "available" probably because according to orch device its status is "LVM detected, locked" (because it's shared by multiple spinning disk OSDs for their DB/WAL)?

Does anybody know where to go from here?


Vlad




On 7/7/22 13:55, Vladimir Brik wrote:
Hello

I am running 17.2.1. We had a disk failure and I followed https://docs.ceph.com/en/quincy/cephadm/services/osd/ to replace the OSD but it didn't work.

I replaced the failed disk, ran "ceph orch osd rm 494 --replace --zap", which stopped and removed the daemon from "ceph orch ps", and deleted the WAL/DB LVM volume of the OSD from the NVMe device shared with other OSDs. "ceph status" says 710 OSDs total, 709 up. So far so good.

BUT

"ceph status" shows osd.494 as stray, even though it is not running on the host, its systemd files have been cleaned up, and "cephadm ls" doesn't show it.

A new OSD is not being created. The logs have entries about osd claims for ID 494 but nothing is happening.

Re-applying the drive group spec below didn't result in anything:
service_type: osd
service_id: r740xd2-mk2-hdd
service_name: osd.r740xd2-mk2-hdd
placement:
   label: r740xd2-mk2
spec:
   data_devices:
     rotational: 1
   db_devices:
     paths:
     - /dev/nvme0n1

Did I do something incorrectly? What do I need to do to re-create the failed OSD?


Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux