Re: Replace ceph osd in a container

Frank Schilder <frans@xxxxxx> · Tue, 22 Oct 2019 07:44:36 +0000

> I am suspecting that mon or mgr have no access to /dev or /var/lib while osd containers do. 
> Cluster configured originally by ceph-ansible (nautilus 14.2.2)

They don't, because they don't need to.

> The question is if I want to replace all disks on a single node, and I have 6 nodes with pools
> replication 3, is it safe to restart mgr mounting /dev and /var/lib/ceph volumes (not configured right now).

Restarting mons is safe in the sense that data will not get lost. However, access might get lost temporarily.

The question is, how many mons do you have? If you have only 1 or 2, it will mean downtime. If you can bear the downtime, it doesn't matter. If you have at least 3, you can restart one after the other.

However, I would not do that. Having to restart a mon container every time some minor container config changes for reasons that have nothing to do with a mon sounds like calling for trouble.

I also use containers and would recommend a different approach. I created an additional type of container (ceph-adm) that I use for all admin tasks. Its the same image and the entry point simply executes a sleep infinity. In this container I make all relevant hardware visible. You might also want to expose /var/run/ceph to be able to use admin sockets without hassle. This way, I separated admin operations from actual storage daemons and can modify and restart the admin container as I like.

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Alex Litvak <alexander.v.litvak@xxxxxxxxx>
Sent: 22 October 2019 08:04
To: ceph-users@xxxxxxxxxxxxxx
Subject:  Replace ceph osd in a container

Hello cephers,

So I am having trouble with a new hardware systems with strange OSD behavior and I want to replace a disk with a brand new one to test the theory.

I run all daemons in containers and on one of the nodes I have mon, mgr, and 6 osds.  So following https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd

I stopped container with osd.23, waited until it is down and out, ran safe-to-destroy loop and then destroyed the osd all using the monitor from the container on this node.  All good.

Then I swapped the SSDs and started running additional steps (from step 3) using the same mon container.  I have no ceph packages installed on the bare metal box. It looks like mon container doesn't
see the disk.

     podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap /dev/sdh
  stderr: lsblk: /dev/sdh: not a block device
  stderr: error: /dev/sdh: No such file or directory
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID]
                            [--osd-fsid OSD_FSID]
                            [DEVICES [DEVICES ...]]
ceph-volume lvm zap: error: Unable to proceed with non-existing device: /dev/sdh
Error: exit status 2
root@storage2n2-la:~# ls -l /dev/sd
sda   sdc   sdd   sde   sdf   sdg   sdg1  sdg2  sdg5  sdh
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph-volume lvm zap sdh
  stderr: lsblk: sdh: not a block device
  stderr: error: sdh: No such file or directory
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm zap [-h] [--destroy] [--osd-id OSD_ID]
                            [--osd-fsid OSD_FSID]
                            [DEVICES [DEVICES ...]]
ceph-volume lvm zap: error: Unable to proceed with non-existing device: sdh
Error: exit status 2

I execute lsblk and it sees device sdh
root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la lsblk
lsblk: dm-1: failed to get device path
lsblk: dm-2: failed to get device path
lsblk: dm-4: failed to get device path
lsblk: dm-6: failed to get device path
lsblk: dm-4: failed to get device path
lsblk: dm-2: failed to get device path
lsblk: dm-1: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-7: failed to get device path
lsblk: dm-5: failed to get device path
lsblk: dm-7: failed to get device path
lsblk: dm-6: failed to get device path
lsblk: dm-5: failed to get device path
lsblk: dm-3: failed to get device path
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdf      8:80   0   1.8T  0 disk
sdd      8:48   0   1.8T  0 disk
sdg      8:96   0 223.5G  0 disk
|-sdg5   8:101  0   223G  0 part
|-sdg1   8:97       487M  0 part
`-sdg2   8:98         1K  0 part
sde      8:64   0   1.8T  0 disk
sdc      8:32   0   3.5T  0 disk
sda      8:0    0   3.5T  0 disk
sdh      8:112  0   3.5T  0 disk

So I use a fellow osd container (osd.5) on the same node and run all of the operations (zap and prepare) successfully.

I am suspecting that mon or mgr have no access to /dev or /var/lib while osd containers do.  Cluster configured originally by ceph-ansible (nautilus 14.2.2)

The question is if I want to replace all disks on a single node, and I have 6 nodes with pools replication 3, is it safe to restart mgr mounting /dev and /var/lib/ceph volumes (not configured right now).

I cannot use other osd containers on the same box because my controller reverts from raid to non-raid mode with all disks lost and not just a single one.  So I need to replace all 6 osds to run back
in containers and the only things will remain operational on node are mon and mgr containers.

I prefer not to install a full cluster or client on the bare metal node if possible.

Thank you for your help,

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx