Sure, you can save the drivegroup spec in a file, edit it according to
your requirements (not sure if having device paths in there makes
sense though) and apply it:
ceph orch apply -i new-drivegroup.yml
Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
Many thanks for your response, Eugen!
I tried to fail mgr twice, unfortunately that had no effect on the issue.
Neither `cephadm ceph-volume inventory` nor `ceph device ls-by-host ceph03`
have the failed drive on the list.
Though your assumption is correct, the spec appears to explicitly include
the failed drive:
---
service_type: osd
service_id: ceph03_combined_osd
service_name: osd.ceph03_combined_osd
placement:
hosts:
- ceph03
spec:
data_devices:
paths:
...
- /dev/sde
...
db_devices:
paths:
- /dev/nvme0n1
- /dev/nvme1n1
filter_logic: AND
objectstore: bluestore
---
Do you know the best way to remove the device from the spec?
/Z
On Fri, 16 Feb 2024 at 14:10, Eugen Block <eblock@xxxxxx> wrote:
Hi,
sometimes the easiest fix is to failover the mgr, have you tried that?
If that didn't work, can you share the drivegroup spec?
ceph orch ls <your_osd_spec> --export
Does it contain specific device paths or something? Does 'cephadm ls'
on that node show any traces of the previous OSD?
I'd probably try to check some things like
cephadm ceph-volume inventory
ceph device ls-by-host <host>
Regards,
Eugen
Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
> Hi,
>
> We had a physical drive malfunction in one of our Ceph OSD hosts managed
by
> cephadm (Ceph 16.2.14). I have removed the drive from the system, and the
> kernel no longer sees it:
>
> ceph03 ~]# ls -al /dev/sde
> ls: cannot access '/dev/sde': No such file or directory
>
> I have removed the corresponding OSD from cephadm, crush map, etc. For
all
> intents and purposes that OSD and its block device no longer exist:
>
> root@ceph01:/# ceph orch ps | grep osd.26
> root@ceph01:/# ceph osd tree| grep 26
> root@ceph01:/# ceph orch device ls | grep -E "ceph03.*sde"
>
> None of the above commands return anything. Cephadm correctly sees 8
> remaining OSDs on the host:
>
> root@ceph01:/# ceph orch ls | grep ceph03_c
> osd.ceph03_combined_osd 8 33s ago 2y ceph03
>
> Unfortunately, cephadm appears to be trying to apply a spec to host
ceph03
> including the disk that is now missing:
>
> RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
> --privileged --group-add=disk --init -e CONTAINER_IMAGE=
>
quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
> -e NODE_NAME=ceph03 -e CEPH_USE_RANDOM_NONCE=1 -e
> CEPH_VOLUME_OSDSPEC_AFFINITY=ceph03_combined_osd -e
> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> /var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z -v
> /var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z -v
>
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z
> -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> /tmp/ceph-tmpc7b33pf0:/etc/ceph/ceph.conf:z -v
> /tmp/ceph-tmpq45nkmd6:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
>
quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
> lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
> /dev/sdg /dev/sdh /dev/sdi --db-devices /dev/nvme0n1 /dev/nvme1n1 --yes
> --no-systemd
>
> Note that `lvm batch` includes the missing drive, /dev/sde. This fails
> because the drive no longer exists. Other than this cephadm ceph-volume
> thingy, the cluster is healthy.How can I tell cephadm that it should stop
> trying to use /dev/sde, which no longer exists, without affecting other
> OSDs on the host?
>
> I would very much appreciate any advice or pointers.
>
> Best regards,
> Zakhar
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx