Thanks, Adam.
Providing the keyring to the cephadm command worked, but the unwanted
(but expected) side effect is that from cephadm perspective it's a
stray daemon. For some reason the orchestrator did apply the desired
drivegroup when I tried to reproduce this morning, but then again
failed just now when I wanted to get rid of the stray daemon. This is
one of the most annoying things with cephadm, I still don't fully
understand when it will correctly apply the identical drivegroup.yml
and when not. Anyway, the conclusion is to not interfere with cephadm
(nothing new here), but since the drivegroup was not applied correctly
I assumed I had to "help out" a bit by manually deploying an OSD.
Thanks,
Eugen
Zitat von Adam King <adking@xxxxxxxxxx>:
> Going off of
>
> ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> you could try passing "--keyring <bootstrap-osd-keyring" to the cephadm
> ceph-volume command. Something like 'cephadm ceph-volume --keyring
> <bootstrap-osd-keyring> -- lvm create'. I'm guessing it's trying to run
the
> osd tree command within a container and I know cephadm mounts keyrings
> passed to the ceph-volume command as
> "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
>
> On Mon, Feb 20, 2023 at 6:35 AM Eugen Block <eblock@xxxxxx> wrote:
>
>> Hi *,
>>
>> I was playing around on an upgraded test cluster (from N to Q),
>> current version:
>>
>> "overall": {
>> "ceph version 17.2.5
>> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
>> }
>>
>> I tried to replace an OSD after destroying it with 'ceph orch osd rm
>> osd.5 --replace'. The OSD was drained successfully and marked as
>> "destroyed" as expected, the zapping also worked. At this point I
>> didn't have an osd spec in place because all OSDs were adopted during
>> the upgrade process. So I created a new spec which was not applied
>> successfully (I'm wondering if there's another/new issue with
>> ceph-volume, but that's not the focus here), so I tried it manually
>> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
>> for a better readability. Apparently, there's no boostrap-osd keyring
>> for cephadm so it can't search the desired osd_id in the osd tree, the
>> command it tries is this:
>>
>> ceph --cluster ceph --name client.bootstrap-osd --keyring
>> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>>
>> In the local filesystem the required keyring is present, though:
>>
>> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
>> [client.bootstrap-osd]
>> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
>> caps mgr = "allow r"
>> caps mon = "profile bootstrap-osd"
>>
>> Is there something missing during the adoption process? Or are the
>> docs lacking some upgrade info? I found a section about putting
>> keyrings under management [1], but I'm not sure if that's what's
>> missing here.
>> Any insights are highly appreciated!
>>
>> Thanks,
>> Eugen
>>
>> [1]
>>
>>
https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management
>>
>>
>> ---snip---
>> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
>> --block.db /dev/sdb --block.db-size 5G
>> Inferring fsid <FSID>
>> Using recent ceph image
>> <LOCAL_REGISTRY>/ceph/ceph@sha256
>> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
>> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
>> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
>> --init -e
>> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
>> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
>> /var/run/ceph/<FSID>:/var/run/ceph:z -v
>> /var/log/ceph/<FSID>:/var/log/ceph:z -v
>> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
>> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
>> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
>> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
>> <LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
--block.db-size
>> 5G
>> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
>> msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\"
>> doesn't exist, skipping"
>> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
>> msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from
>> \"/etc/containers/mounts.conf\" doesn't exist, skipping"
>> /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool
>> --gen-print-key
>> /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph
>> --name client.bootstrap-osd --keyring
>> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
>> (2) No such file or
>> directory
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
>>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
>> disabling
>> cephx
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
>> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd250065910) no keyring found at
>> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
>> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd255e2eea0) no keyring found at
>> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> /usr/bin/podman: stderr stderr: [errno 2] RADOS object not found
>> (error connecting to the cluster)
>> /usr/bin/podman: stderr Traceback (most recent call last):
>> /usr/bin/podman: stderr File "/usr/sbin/ceph-volume", line 11, in
>> <module>
>> /usr/bin/podman: stderr load_entry_point('ceph-volume==1.0.0',
>> 'console_scripts', 'ceph-volume')()
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
>> __init__
>> /usr/bin/podman: stderr self.main(self.argv)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59,
>> in newfunc
>> /usr/bin/podman: stderr return f(*a, **kw)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
>> main
>> /usr/bin/podman: stderr terminal.dispatch(self.mapper,
subcommand_args)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194,
>> in dispatch
>> /usr/bin/podman: stderr instance.main()
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
>> line 46, in main
>> /usr/bin/podman: stderr terminal.dispatch(self.mapper, self.argv)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194,
>> in dispatch
>> /usr/bin/podman: stderr instance.main()
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
>> line 77, in main
>> /usr/bin/podman: stderr self.create(args)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16,
>> in is_root
>> /usr/bin/podman: stderr return func(*a, **kw)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
>> line 26, in create
>> /usr/bin/podman: stderr prepare_step.safe_prepare(args)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
>> line 252, in safe_prepare
>> /usr/bin/podman: stderr self.prepare()
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16,
>> in is_root
>> /usr/bin/podman: stderr return func(*a, **kw)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
>> line 292, in prepare
>> /usr/bin/podman: stderr self.osd_id =
>> prepare_utils.create_id(osd_fsid, json.dumps(secrets),
>> osd_id=self.args.osd_id)
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line
>> 166, in create_id
>> /usr/bin/podman: stderr if osd_id_available(osd_id):
>> /usr/bin/podman: stderr File
>> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line
>> 204, in osd_id_available
>> /usr/bin/podman: stderr raise RuntimeError('Unable check if OSD id
>> exists: %s' % osd_id)
>> /usr/bin/podman: stderr RuntimeError: Unable check if OSD id exists: 5
>> Traceback (most recent call last):
>> File "/usr/sbin/cephadm", line 9170, in <module>
>> main()
>> File "/usr/sbin/cephadm", line 9158, in main
>> r = ctx.func(ctx)
>> File "/usr/sbin/cephadm", line 1917, in _infer_config
>> return func(ctx)
>> File "/usr/sbin/cephadm", line 1877, in _infer_fsid
>> return func(ctx)
>> File "/usr/sbin/cephadm", line 1945, in _infer_image
>> return func(ctx)
>> File "/usr/sbin/cephadm", line 1835, in _validate_fsid
>> return func(ctx)
>> File "/usr/sbin/cephadm", line 5294, in command_ceph_volume
>> out, err, code = call_throws(ctx, c.run_cmd())
>> File "/usr/sbin/cephadm", line 1637, in call_throws
>> raise RuntimeError('Failed command: %s' % ' '.join(command))
>> RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host
>> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
>> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
>> --init -e
>> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
>> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
>> /var/run/ceph/<FSID>:/var/run/ceph:z -v
>> /var/log/ceph/<FSID>:/var/log/ceph:z -v
>> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
>> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
>> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
>> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
>> <LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
--block.db-size
>> 5G
>> ---snip---
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx