Hello cephers,
So I am having trouble with a new hardware systems with strange OSD behavior and I want to replace a disk with a brand new one to test the theory.
I run all daemons in containers and on one of the nodes I have mon, mgr, and 6 osds. So following https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
I stopped container with osd.23, waited until it is down and out, ran safe-to-destroy loop and then destroyed the osd all using the monitor from the container on this node. All good.
Then I swapped the SSDs and started running additional steps (from step 3) using the same mon container. I have no ceph packages installed on the bare metal box. It looks like mon container doesn't
see the disk.
podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap /dev/sdh
stderr: lsblk: /dev/sdh: not a block device
stderr: error: /dev/sdh: No such file or directory
stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID]
[--osd-fsid OSD_FSID]
[DEVICES [DEVICES ...]]
ceph-volume lvm zap: error: Unable to proceed with non-existing device: /dev/sdh
Error: exit status 2
root@storage2n2-la:~# ls -l /dev/sd
sda sdc sdd sde sdf sdg sdg1 sdg2 sdg5 sdh
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap sdh
stderr: lsblk: sdh: not a block device
stderr: error: sdh: No such file or directory
stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID]
[--osd-fsid OSD_FSID]
[DEVICES [DEVICES ...]]
ceph-volume lvm zap: error: Unable to proceed with non-existing device: sdh
Error: exit status 2
I execute lsblk and it sees device sdh
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la lsblk
lsblk: dm-1: failed to get device path
lsblk: dm-2: failed to get device path
lsblk: dm-4: failed to get device path
lsblk: dm-6: failed to get device path
lsblk: dm-4: failed to get device path
lsblk: dm-2: failed to get device path
lsblk: dm-1: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-7: failed to get device path
lsblk: dm-5: failed to get device path
lsblk: dm-7: failed to get device path
lsblk: dm-6: failed to get device path
lsblk: dm-5: failed to get device path
lsblk: dm-3: failed to get device path
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdf 8:80 0 1.8T 0 disk
sdd 8:48 0 1.8T 0 disk
sdg 8:96 0 223.5G 0 disk
|-sdg5 8:101 0 223G 0 part
|-sdg1 8:97 487M 0 part
`-sdg2 8:98 1K 0 part
sde 8:64 0 1.8T 0 disk
sdc 8:32 0 3.5T 0 disk
sda 8:0 0 3.5T 0 disk
sdh 8:112 0 3.5T 0 disk
So I use a fellow osd container (osd.5) on the same node and run all of the operations (zap and prepare) successfully.
I am suspecting that mon or mgr have no access to /dev or /var/lib while osd containers do. Cluster configured originally by ceph-ansible (nautilus 14.2.2)
The question is if I want to replace all disks on a single node, and I have 6 nodes with pools replication 3, is it safe to restart mgr mounting /dev and /var/lib/ceph volumes (not configured right now).
I cannot use other osd containers on the same box because my controller reverts from raid to non-raid mode with all disks lost and not just a single one. So I need to replace all 6 osds to run back
in containers and the only things will remain operational on node are mon and mgr containers.
I prefer not to install a full cluster or client on the bare metal node if possible.
Thank you for your help,
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com