> I am suspecting that mon or mgr have no access to /dev or /var/lib while osd containers do. > Cluster configured originally by ceph-ansible (nautilus 14.2.2) They don't, because they don't need to. > The question is if I want to replace all disks on a single node, and I have 6 nodes with pools > replication 3, is it safe to restart mgr mounting /dev and /var/lib/ceph volumes (not configured right now). Restarting mons is safe in the sense that data will not get lost. However, access might get lost temporarily. The question is, how many mons do you have? If you have only 1 or 2, it will mean downtime. If you can bear the downtime, it doesn't matter. If you have at least 3, you can restart one after the other. However, I would not do that. Having to restart a mon container every time some minor container config changes for reasons that have nothing to do with a mon sounds like calling for trouble. I also use containers and would recommend a different approach. I created an additional type of container (ceph-adm) that I use for all admin tasks. Its the same image and the entry point simply executes a sleep infinity. In this container I make all relevant hardware visible. You might also want to expose /var/run/ceph to be able to use admin sockets without hassle. This way, I separated admin operations from actual storage daemons and can modify and restart the admin container as I like. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Alex Litvak <alexander.v.litvak@xxxxxxxxx> Sent: 22 October 2019 08:04 To: ceph-users@xxxxxxxxxxxxxx Subject: Replace ceph osd in a container Hello cephers, So I am having trouble with a new hardware systems with strange OSD behavior and I want to replace a disk with a brand new one to test the theory. I run all daemons in containers and on one of the nodes I have mon, mgr, and 6 osds. So following https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd I stopped container with osd.23, waited until it is down and out, ran safe-to-destroy loop and then destroyed the osd all using the monitor from the container on this node. All good. Then I swapped the SSDs and started running additional steps (from step 3) using the same mon container. I have no ceph packages installed on the bare metal box. It looks like mon container doesn't see the disk. podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap /dev/sdh stderr: lsblk: /dev/sdh: not a block device stderr: error: /dev/sdh: No such file or directory stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected. usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID] [--osd-fsid OSD_FSID] [DEVICES [DEVICES ...]] ceph-volume lvm zap: error: Unable to proceed with non-existing device: /dev/sdh Error: exit status 2 root@storage2n2-la:~# ls -l /dev/sd sda sdc sdd sde sdf sdg sdg1 sdg2 sdg5 sdh root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap sdh stderr: lsblk: sdh: not a block device stderr: error: sdh: No such file or directory stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected. usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID] [--osd-fsid OSD_FSID] [DEVICES [DEVICES ...]] ceph-volume lvm zap: error: Unable to proceed with non-existing device: sdh Error: exit status 2 I execute lsblk and it sees device sdh root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la lsblk lsblk: dm-1: failed to get device path lsblk: dm-2: failed to get device path lsblk: dm-4: failed to get device path lsblk: dm-6: failed to get device path lsblk: dm-4: failed to get device path lsblk: dm-2: failed to get device path lsblk: dm-1: failed to get device path lsblk: dm-0: failed to get device path lsblk: dm-0: failed to get device path lsblk: dm-7: failed to get device path lsblk: dm-5: failed to get device path lsblk: dm-7: failed to get device path lsblk: dm-6: failed to get device path lsblk: dm-5: failed to get device path lsblk: dm-3: failed to get device path NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdf 8:80 0 1.8T 0 disk sdd 8:48 0 1.8T 0 disk sdg 8:96 0 223.5G 0 disk |-sdg5 8:101 0 223G 0 part |-sdg1 8:97 487M 0 part `-sdg2 8:98 1K 0 part sde 8:64 0 1.8T 0 disk sdc 8:32 0 3.5T 0 disk sda 8:0 0 3.5T 0 disk sdh 8:112 0 3.5T 0 disk So I use a fellow osd container (osd.5) on the same node and run all of the operations (zap and prepare) successfully. I am suspecting that mon or mgr have no access to /dev or /var/lib while osd containers do. Cluster configured originally by ceph-ansible (nautilus 14.2.2) The question is if I want to replace all disks on a single node, and I have 6 nodes with pools replication 3, is it safe to restart mgr mounting /dev and /var/lib/ceph volumes (not configured right now). I cannot use other osd containers on the same box because my controller reverts from raid to non-raid mode with all disks lost and not just a single one. So I need to replace all 6 osds to run back in containers and the only things will remain operational on node are mon and mgr containers. I prefer not to install a full cluster or client on the bare metal node if possible. Thank you for your help, _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx