Hello Vlad,
- add the following to your yaml and apply it:
unmanaged: true
- go to the server that hosts the failed OSD
- fire up cephadm shell, if it's not there install it and give the
server the _admin label:
ceph orch host label add servername _admin
- ceph orch osd rm 494
- ceph-volume lvm deactivate 494
- ceph-volume lvm zap --destroy --osd-id 494
- leave cephadm shell
- check if db, wal and osd were removed on the server (lsblk, vgs, lvs)
- if not remove the volumes by hand with lvremove
- set unmanaged: false and apply the yaml
Best,
Malte
Am 07.07.22 um 20:55 schrieb Vladimir Brik:
Hello
I am running 17.2.1. We had a disk failure and I followed
https://docs.ceph.com/en/quincy/cephadm/services/osd/ to replace the OSD
but it didn't work.
I replaced the failed disk, ran "ceph orch osd rm 494 --replace --zap",
which stopped and removed the daemon from "ceph orch ps", and deleted
the WAL/DB LVM volume of the OSD from the NVMe device shared with other
OSDs. "ceph status" says 710 OSDs total, 709 up. So far so good.
BUT
"ceph status" shows osd.494 as stray, even though it is not running on
the host, its systemd files have been cleaned up, and "cephadm ls"
doesn't show it.
A new OSD is not being created. The logs have entries about osd claims
for ID 494 but nothing is happening.
Re-applying the drive group spec below didn't result in anything:
service_type: osd
service_id: r740xd2-mk2-hdd
service_name: osd.r740xd2-mk2-hdd
placement:
label: r740xd2-mk2
spec:
data_devices:
rotational: 1
db_devices:
paths:
- /dev/nvme0n1
Did I do something incorrectly? What do I need to do to re-create the
failed OSD?
Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx