Re: reinstalling node with orchestrator/cephadm

Eugen Block <eblock@xxxxxx> · Fri, 05 Feb 2021 15:11:20 +0000

Hi Kenneth,

I managed to succeed with this just now. It's a lab environment and  
the OSDs are not encrypted but I was able to get the OSDs up again.
The ceph-volume commands also worked (just activation didn't) so I had  
the required information about those OSDs.

What I did was

- collect the OSD data (fsid, keyring)
- create directories for osd daemons under /var/lib/ceph/<CEPH_UUID>/osd.<ID>
- note that the directory with the ceph uuid already existed since the  
crash container had been created after bringing the node back into the  
cluster
- creating the content for that OSD by copying the required files from  
a different host and changed the contents of
    - fsid
    - keyring
    - whoami
    - unit.run
    - unit.poststop

- created the symlinks to the OSD devices:
    - ln -s /dev/ceph-<VG>/osd-block-<LV> block
    - ln -s /dev/ceph-<VG>/osd-block-<LV> block.db

- changed ownership to ceph
    - chown -R ceph.ceph /var/lib/ceph/<UUID>/osd.<ID>/

- started the systemd unit
    - systemctl start ceph-<CEPH_UUID>@osd.<ID>.service

I repeated this for all OSDs on that host, now all OSDs are online and  
the cluster is happy. I'm not sure what else is necessary in case of  
encrypted OSDs, but maybe this procedure helps you.
I don't know if there's a smoother or even automated way, I don't  
think there currently is. Maybe someone is working on it though.

Regards,
Eugen

Zitat von Kenneth Waegeman <kenneth.waegeman@xxxxxxxx>:

Hi all,

I'm running a 15.2.8 cluster using ceph orch with all daemons  
adopted to cephadm.

I tried reinstall an OSD node. Is there a way to make ceph  
orch/cephadm activate the devices on this node again, ideally  
automatically?

I tried running `cephadm ceph-volume -- lvm activate --all` but this  
has an error related to dmcrypt:

[root@osd2803 ~]# cephadm ceph-volume -- lvm activate --all
Using recent ceph image docker.io/ceph/ceph:v15
/usr/bin/podman:stderr --> Activating OSD ID 0 FSID  
697698fd-3fa0-480f-807b-68492bd292bf
/usr/bin/podman:stderr Running command: /usr/bin/mount -t tmpfs  
tmpfs /var/lib/ceph/osd/ceph-0
/usr/bin/podman:stderr Running command: /usr/bin/ceph-authtool  
/var/lib/ceph/osd/ceph-0/lockbox.keyring --create-keyring --name  
client.osd-lockbox.697698fd-3fa0-480f-807b-68492bd292bf --add-key  
AQAy7Bdg0jQsBhAAj0gcteTEbcpwNNvMGZqTTg==
/usr/bin/podman:stderr  stdout: creating  
/var/lib/ceph/osd/ceph-0/lockbox.keyring
/usr/bin/podman:stderr added entity  
client.osd-lockbox.697698fd-3fa0-480f-807b-68492bd292bf  
auth(key=AQAy7Bdg0jQsBhAAj0gcteTEbcpwNNvMGZqTTg==)
/usr/bin/podman:stderr Running command: /usr/bin/chown -R ceph:ceph  
/var/lib/ceph/osd/ceph-0/lockbox.keyring
/usr/bin/podman:stderr Running command: /usr/bin/ceph --cluster  
ceph --name client.osd-lockbox.697698fd-3fa0-480f-807b-68492bd292bf  
--keyring /var/lib/ceph/osd/ceph-0/lockbox.keyring config-key get  
dm-crypt/osd/697698fd-3fa0-480f-807b-68492bd292bf/luks
/usr/bin/podman:stderr  stderr: Error initializing cluster client:  
ObjectNotFound('RADOS object not found (error calling  
conf_read_file)',)
/usr/bin/podman:stderr -->  RuntimeError: Unable to retrieve dmcrypt secret
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 6111, in <module>
    r = args.func()
  File "/usr/sbin/cephadm", line 1322, in _infer_fsid
    return func()
  File "/usr/sbin/cephadm", line 1381, in _infer_image
    return func()
  File "/usr/sbin/cephadm", line 3611, in command_ceph_volume
    out, err, code = call_throws(c.run_cmd(), verbose=True)
  File "/usr/sbin/cephadm", line 1060, in call_throws
    raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host  
--net=host --entrypoint /usr/sbin/ceph-volume --privileged  
--group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e  
NODE_NAME=osd2803.banette.os -v /dev:/dev -v /run/udev:/run/udev -v  
/sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm  
docker.io/ceph/ceph:v15 lvm activate --all

The OSDs are encrypted indeed. `cephadm ceph-volume lvm list` and  
`cephadm shell ceph -s` run just fine, and if I run ceph-volume  
directly, the same command works, but then of course the daemons are  
started in the legacy way again, not in containers.

Is there another way trough the 'ceph orch' to achieve this? Or if  
`cephadm ceph-volume -- lvm activate --all` would be the way to go  
here, I'm probably seeing a bug here ?

Thanks!!

Kenneth

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx