Re: reinstalling node with orchestrator/cephadm

Martin Verges <martin.verges@xxxxxxxx> · Mon, 8 Feb 2021 17:37:20 +0100

Hello,

you could switch to croit. We can overtake existing clusters without
much pain and then you have a single button to upgrade in the future
;)

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

Am Mo., 8. Feb. 2021 um 16:53 Uhr schrieb Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx>:
>
> Hi Eugen, all,
>
> Thanks for sharing your results! Since we have multiple clusters and
> clusters with +500 OSDs, this solution is not feasible for us.
>
> In the meantime I created an issue for this :
>
> https://tracker.ceph.com/issues/49159
>
> We would need this especially to migrate/reinstall all our clusters to
> Rhel8 (without destroying/recreating all osd disks), so I really hope
> there is another solution :)
>
> Thanks again!
>
> Kenneth
>
> On 05/02/2021 16:11, Eugen Block wrote:
> > Hi Kenneth,
> >
> > I managed to succeed with this just now. It's a lab environment and
> > the OSDs are not encrypted but I was able to get the OSDs up again.
> > The ceph-volume commands also worked (just activation didn't) so I had
> > the required information about those OSDs.
> >
> > What I did was
> >
> > - collect the OSD data (fsid, keyring)
> > - create directories for osd daemons under
> > /var/lib/ceph/<CEPH_UUID>/osd.<ID>
> > - note that the directory with the ceph uuid already existed since the
> > crash container had been created after bringing the node back into the
> > cluster
> > - creating the content for that OSD by copying the required files from
> > a different host and changed the contents of
> >     - fsid
> >     - keyring
> >     - whoami
> >     - unit.run
> >     - unit.poststop
> >
> > - created the symlinks to the OSD devices:
> >     - ln -s /dev/ceph-<VG>/osd-block-<LV> block
> >     - ln -s /dev/ceph-<VG>/osd-block-<LV> block.db
> >
> > - changed ownership to ceph
> >     - chown -R ceph.ceph /var/lib/ceph/<UUID>/osd.<ID>/
> >
> > - started the systemd unit
> >     - systemctl start ceph-<CEPH_UUID>@osd.<ID>.service
> >
> > I repeated this for all OSDs on that host, now all OSDs are online and
> > the cluster is happy. I'm not sure what else is necessary in case of
> > encrypted OSDs, but maybe this procedure helps you.
> > I don't know if there's a smoother or even automated way, I don't
> > think there currently is. Maybe someone is working on it though.
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von Kenneth Waegeman <kenneth.waegeman@xxxxxxxx>:
> >
> >> Hi all,
> >>
> >> I'm running a 15.2.8 cluster using ceph orch with all daemons adopted
> >> to cephadm.
> >>
> >> I tried reinstall an OSD node. Is there a way to make ceph
> >> orch/cephadm activate the devices on this node again, ideally
> >> automatically?
> >>
> >> I tried running `cephadm ceph-volume -- lvm activate --all` but this
> >> has an error related to dmcrypt:
> >>
> >>> [root@osd2803 ~]# cephadm ceph-volume -- lvm activate --all
> >>> Using recent ceph image docker.io/ceph/ceph:v15
> >>> /usr/bin/podman:stderr --> Activating OSD ID 0 FSID
> >>> 697698fd-3fa0-480f-807b-68492bd292bf
> >>> /usr/bin/podman:stderr Running command: /usr/bin/mount -t tmpfs
> >>> tmpfs /var/lib/ceph/osd/ceph-0
> >>> /usr/bin/podman:stderr Running command: /usr/bin/ceph-authtool
> >>> /var/lib/ceph/osd/ceph-0/lockbox.keyring --create-keyring --name
> >>> client.osd-lockbox.697698fd-3fa0-480f-807b-68492bd292bf --add-key
> >>> AQAy7Bdg0jQsBhAAj0gcteTEbcpwNNvMGZqTTg==
> >>> /usr/bin/podman:stderr  stdout: creating
> >>> /var/lib/ceph/osd/ceph-0/lockbox.keyring
> >>> /usr/bin/podman:stderr added entity
> >>> client.osd-lockbox.697698fd-3fa0-480f-807b-68492bd292bf
> >>> auth(key=AQAy7Bdg0jQsBhAAj0gcteTEbcpwNNvMGZqTTg==)
> >>> /usr/bin/podman:stderr Running command: /usr/bin/chown -R ceph:ceph
> >>> /var/lib/ceph/osd/ceph-0/lockbox.keyring
> >>> /usr/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph
> >>> --name client.osd-lockbox.697698fd-3fa0-480f-807b-68492bd292bf
> >>> --keyring /var/lib/ceph/osd/ceph-0/lockbox.keyring config-key get
> >>> dm-crypt/osd/697698fd-3fa0-480f-807b-68492bd292bf/luks
> >>> /usr/bin/podman:stderr  stderr: Error initializing cluster client:
> >>> ObjectNotFound('RADOS object not found (error calling
> >>> conf_read_file)',)
> >>> /usr/bin/podman:stderr -->  RuntimeError: Unable to retrieve dmcrypt
> >>> secret
> >>> Traceback (most recent call last):
> >>>   File "/usr/sbin/cephadm", line 6111, in <module>
> >>>     r = args.func()
> >>>   File "/usr/sbin/cephadm", line 1322, in _infer_fsid
> >>>     return func()
> >>>   File "/usr/sbin/cephadm", line 1381, in _infer_image
> >>>     return func()
> >>>   File "/usr/sbin/cephadm", line 3611, in command_ceph_volume
> >>>     out, err, code = call_throws(c.run_cmd(), verbose=True)
> >>>   File "/usr/sbin/cephadm", line 1060, in call_throws
> >>>     raise RuntimeError('Failed command: %s' % ' '.join(command))
> >>> RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host
> >>> --net=host --entrypoint /usr/sbin/ceph-volume --privileged
> >>> --group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e
> >>> NODE_NAME=osd2803.banette.os -v /dev:/dev -v /run/udev:/run/udev -v
> >>> /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm
> >>> docker.io/ceph/ceph:v15 lvm activate --all
> >>
> >> The OSDs are encrypted indeed. `cephadm ceph-volume lvm list` and
> >> `cephadm shell ceph -s` run just fine, and if I run ceph-volume
> >> directly, the same command works, but then of course the daemons are
> >> started in the legacy way again, not in containers.
> >>
> >> Is there another way trough the 'ceph orch' to achieve this? Or if
> >> `cephadm ceph-volume -- lvm activate --all` would be the way to go
> >> here, I'm probably seeing a bug here ?
> >>
> >> Thanks!!
> >>
> >> Kenneth
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx