Just in case somebody finds themselves in the same boat:
The best I can tell, automatic OSD creation just doesn't
work if you use separate DB/WAL devices.
I ended up creating DB/WAL volume manually (not sure it's
necessary) and then running this command:
ceph orch daemon add osd
HOST:data_devices=/dev/sdX,db_devices=VG/LV
Vlad
On 7/8/22 13:57, Vladimir Brik wrote:
I think I found the problem in cephadm.log:
DEBUG ... cephadm ['--image', ... 'ceph-volume', '--fsid',
..., '--config-json', '-', '--', 'lvm', 'batch',
'--no-auto', '/dev/sdv', '--db-devices', '/dev/nvme0n1',
'--osd-ids', '494', '--yes', '--no-systemd', '--report',
'--format', 'json']
...
DEBUG /bin/podman: --> passed data devices: 1 physical, 0 LVM
DEBUG /bin/podman: --> relative data size: 1.0
DEBUG /bin/podman: --> passed block_db devices: 1 physical,
0 LVM
DEBUG /bin/podman: --> 1 fast devices were passed, but none
are available
So it looks like cephadm ceph-volume is unhappy because
/dev/nvme0n1 is not "available" probably because according
to orch device its status is "LVM detected, locked" (because
it's shared by multiple spinning disk OSDs for their DB/WAL)?
Does anybody know where to go from here?
Vlad
On 7/7/22 13:55, Vladimir Brik wrote:
Hello
I am running 17.2.1. We had a disk failure and I followed
https://docs.ceph.com/en/quincy/cephadm/services/osd/ to
replace the OSD but it didn't work.
I replaced the failed disk, ran "ceph orch osd rm 494
--replace --zap", which stopped and removed the daemon
from "ceph orch ps", and deleted the WAL/DB LVM volume of
the OSD from the NVMe device shared with other OSDs. "ceph
status" says 710 OSDs total, 709 up. So far so good.
BUT
"ceph status" shows osd.494 as stray, even though it is
not running on the host, its systemd files have been
cleaned up, and "cephadm ls" doesn't show it.
A new OSD is not being created. The logs have entries
about osd claims for ID 494 but nothing is happening.
Re-applying the drive group spec below didn't result in
anything:
service_type: osd
service_id: r740xd2-mk2-hdd
service_name: osd.r740xd2-mk2-hdd
placement:
label: r740xd2-mk2
spec:
data_devices:
rotational: 1
db_devices:
paths:
- /dev/nvme0n1
Did I do something incorrectly? What do I need to do to
re-create the failed OSD?
Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx