so "ceph osd tree destroyed -f json-pretty" shows the nautilus2 host with the osd id you're trying to replace here? And there are disks marked available that match the spec (20G rotational disk in this case I guess) in "ceph orch device ls nautilus2"? On Mon, Feb 20, 2023 at 10:16 AM Eugen Block <eblock@xxxxxx> wrote: > I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a > replace.yaml to redeploy only the one destroyed disk, but still > nothing happens with that disk. This is my replace.yaml: > > ---snip--- > nautilus:~ # cat replace-osd-7.yaml > service_type: osd > service_name: osd > placement: > hosts: > - nautilus2 > spec: > data_devices: > rotational: 1 > size: '20G:' > db_devices: > rotational: 0 > size: '13G:16G' > filter_logic: AND > objectstore: bluestore > osd_id_claims: > nautilus2: ['7'] > ---snip--- > > I see these lines in the mgr.log: > > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log > [INF] : Found osd claims -> {'nautilus2': ['7']} > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO > cephadm.services.osd] Found osd claims for drivegroup None -> > {'nautilus2': ['7']} > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log > [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']} > > But I see no attempt to actually deploy the OSD. > > [2] > > https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace > > Zitat von Adam King <adking@xxxxxxxxxx>: > > > For reference, a stray daemon from cephadm POV is roughly just something > > that shows up in "ceph node ls" that doesn't have a directory in > > /var/lib/ceph/<fsid>. I guess manually making the OSD as you did means > that > > didn't end up getting made. I remember the manual osd creation process > (by > > manual just meaning not using an orchestrator/cephadm mgr module command) > > coming up at one point and the we ended up manually running "cephadm > > deploy" to make sure those directories get created correctly, but I don't > > think any docs ever got made about it (yet, anyway). Also, is there a > > tracker issue for it not correctly handling the drivegroup? > > > > On Mon, Feb 20, 2023 at 8:58 AM Eugen Block <eblock@xxxxxx> wrote: > > > >> Thanks, Adam. > >> > >> Providing the keyring to the cephadm command worked, but the unwanted > >> (but expected) side effect is that from cephadm perspective it's a > >> stray daemon. For some reason the orchestrator did apply the desired > >> drivegroup when I tried to reproduce this morning, but then again > >> failed just now when I wanted to get rid of the stray daemon. This is > >> one of the most annoying things with cephadm, I still don't fully > >> understand when it will correctly apply the identical drivegroup.yml > >> and when not. Anyway, the conclusion is to not interfere with cephadm > >> (nothing new here), but since the drivegroup was not applied correctly > >> I assumed I had to "help out" a bit by manually deploying an OSD. > >> > >> Thanks, > >> Eugen > >> > >> Zitat von Adam King <adking@xxxxxxxxxx>: > >> > >> > Going off of > >> > > >> > ceph --cluster ceph --name client.bootstrap-osd --keyring > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > >> > > >> > you could try passing "--keyring <bootstrap-osd-keyring" to the > cephadm > >> > ceph-volume command. Something like 'cephadm ceph-volume --keyring > >> > <bootstrap-osd-keyring> -- lvm create'. I'm guessing it's trying to > run > >> the > >> > osd tree command within a container and I know cephadm mounts keyrings > >> > passed to the ceph-volume command as > >> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. > >> > > >> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block <eblock@xxxxxx> wrote: > >> > > >> >> Hi *, > >> >> > >> >> I was playing around on an upgraded test cluster (from N to Q), > >> >> current version: > >> >> > >> >> "overall": { > >> >> "ceph version 17.2.5 > >> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 > >> >> } > >> >> > >> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm > >> >> osd.5 --replace'. The OSD was drained successfully and marked as > >> >> "destroyed" as expected, the zapping also worked. At this point I > >> >> didn't have an osd spec in place because all OSDs were adopted during > >> >> the upgrade process. So I created a new spec which was not applied > >> >> successfully (I'm wondering if there's another/new issue with > >> >> ceph-volume, but that's not the focus here), so I tried it manually > >> >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end > >> >> for a better readability. Apparently, there's no boostrap-osd keyring > >> >> for cephadm so it can't search the desired osd_id in the osd tree, > the > >> >> command it tries is this: > >> >> > >> >> ceph --cluster ceph --name client.bootstrap-osd --keyring > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > >> >> > >> >> In the local filesystem the required keyring is present, though: > >> >> > >> >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring > >> >> [client.bootstrap-osd] > >> >> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug== > >> >> caps mgr = "allow r" > >> >> caps mon = "profile bootstrap-osd" > >> >> > >> >> Is there something missing during the adoption process? Or are the > >> >> docs lacking some upgrade info? I found a section about putting > >> >> keyrings under management [1], but I'm not sure if that's what's > >> >> missing here. > >> >> Any insights are highly appreciated! > >> >> > >> >> Thanks, > >> >> Eugen > >> >> > >> >> [1] > >> >> > >> >> > >> > https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management > >> >> > >> >> > >> >> ---snip--- > >> >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data > /dev/sde > >> >> --block.db /dev/sdb --block.db-size 5G > >> >> Inferring fsid <FSID> > >> >> Using recent ceph image > >> >> <LOCAL_REGISTRY>/ceph/ceph@sha256 > >> >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> >> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host > >> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json > --net=host > >> >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk > >> >> --init -e > >> >> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256 > >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e > >> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v > >> >> /var/run/ceph/<FSID>:/var/run/ceph:z -v > >> >> /var/log/ceph/<FSID>:/var/log/ceph:z -v > >> >> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v > >> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v > >> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v > >> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z > >> >> <LOCAL_REGISTRY>/ceph/ceph@sha256 > >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> >> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb > >> --block.db-size > >> >> 5G > >> >> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" > level=warning > >> >> msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\" > >> >> doesn't exist, skipping" > >> >> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" > level=warning > >> >> msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from > >> >> \"/etc/containers/mounts.conf\" doesn't exist, skipping" > >> >> /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool > >> >> --gen-print-key > >> >> /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph > >> >> --name client.bootstrap-osd --keyring > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000 > >> >> 7fd255e30700 -1 auth: unable to find a keyring on > >> >> > >> > /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: > >> >> (2) No such file or > >> >> directory > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+0000 > >> >> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at > >> >> > >> > /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, > >> >> disabling > >> >> cephx > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000 > >> >> 7fd255e30700 -1 auth: unable to find a keyring on > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or > directory > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+0000 > >> >> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000 > >> >> 7fd255e30700 -1 auth: unable to find a keyring on > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or > directory > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000 > >> >> 7fd255e30700 -1 AuthRegistry(0x7fd250065910) no keyring found at > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000 > >> >> 7fd255e30700 -1 auth: unable to find a keyring on > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or > directory > >> >> /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+0000 > >> >> 7fd255e30700 -1 AuthRegistry(0x7fd255e2eea0) no keyring found at > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > >> >> /usr/bin/podman: stderr stderr: [errno 2] RADOS object not found > >> >> (error connecting to the cluster) > >> >> /usr/bin/podman: stderr Traceback (most recent call last): > >> >> /usr/bin/podman: stderr File "/usr/sbin/ceph-volume", line 11, in > >> >> <module> > >> >> /usr/bin/podman: stderr load_entry_point('ceph-volume==1.0.0', > >> >> 'console_scripts', 'ceph-volume')() > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in > >> >> __init__ > >> >> /usr/bin/podman: stderr self.main(self.argv) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line > 59, > >> >> in newfunc > >> >> /usr/bin/podman: stderr return f(*a, **kw) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in > >> >> main > >> >> /usr/bin/podman: stderr terminal.dispatch(self.mapper, > >> subcommand_args) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, > >> >> in dispatch > >> >> /usr/bin/podman: stderr instance.main() > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", > >> >> line 46, in main > >> >> /usr/bin/podman: stderr terminal.dispatch(self.mapper, self.argv) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, > >> >> in dispatch > >> >> /usr/bin/podman: stderr instance.main() > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", > >> >> line 77, in main > >> >> /usr/bin/podman: stderr self.create(args) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line > 16, > >> >> in is_root > >> >> /usr/bin/podman: stderr return func(*a, **kw) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", > >> >> line 26, in create > >> >> /usr/bin/podman: stderr prepare_step.safe_prepare(args) > >> >> /usr/bin/podman: stderr File > >> >> > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", > >> >> line 252, in safe_prepare > >> >> /usr/bin/podman: stderr self.prepare() > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line > 16, > >> >> in is_root > >> >> /usr/bin/podman: stderr return func(*a, **kw) > >> >> /usr/bin/podman: stderr File > >> >> > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", > >> >> line 292, in prepare > >> >> /usr/bin/podman: stderr self.osd_id = > >> >> prepare_utils.create_id(osd_fsid, json.dumps(secrets), > >> >> osd_id=self.args.osd_id) > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line > >> >> 166, in create_id > >> >> /usr/bin/podman: stderr if osd_id_available(osd_id): > >> >> /usr/bin/podman: stderr File > >> >> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line > >> >> 204, in osd_id_available > >> >> /usr/bin/podman: stderr raise RuntimeError('Unable check if OSD > id > >> >> exists: %s' % osd_id) > >> >> /usr/bin/podman: stderr RuntimeError: Unable check if OSD id exists: > 5 > >> >> Traceback (most recent call last): > >> >> File "/usr/sbin/cephadm", line 9170, in <module> > >> >> main() > >> >> File "/usr/sbin/cephadm", line 9158, in main > >> >> r = ctx.func(ctx) > >> >> File "/usr/sbin/cephadm", line 1917, in _infer_config > >> >> return func(ctx) > >> >> File "/usr/sbin/cephadm", line 1877, in _infer_fsid > >> >> return func(ctx) > >> >> File "/usr/sbin/cephadm", line 1945, in _infer_image > >> >> return func(ctx) > >> >> File "/usr/sbin/cephadm", line 1835, in _validate_fsid > >> >> return func(ctx) > >> >> File "/usr/sbin/cephadm", line 5294, in command_ceph_volume > >> >> out, err, code = call_throws(ctx, c.run_cmd()) > >> >> File "/usr/sbin/cephadm", line 1637, in call_throws > >> >> raise RuntimeError('Failed command: %s' % ' '.join(command)) > >> >> RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host > >> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json > --net=host > >> >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk > >> >> --init -e > >> >> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256 > >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e > >> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v > >> >> /var/run/ceph/<FSID>:/var/run/ceph:z -v > >> >> /var/log/ceph/<FSID>:/var/log/ceph:z -v > >> >> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v > >> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v > >> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v > >> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z > >> >> <LOCAL_REGISTRY>/ceph/ceph@sha256 > >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> >> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb > >> --block.db-size > >> >> 5G > >> >> ---snip--- > >> >> _______________________________________________ > >> >> ceph-users mailing list -- ceph-users@xxxxxxx > >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> >> > >> >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx