Re: Missing keyrings on upgraded cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a replace.yaml to redeploy only the one destroyed disk, but still nothing happens with that disk. This is my replace.yaml:

---snip---
nautilus:~ # cat replace-osd-7.yaml
service_type: osd
service_name: osd
placement:
  hosts:
  - nautilus2
spec:
  data_devices:
    rotational: 1
    size: '20G:'
  db_devices:
    rotational: 0
    size: '13G:16G'
  filter_logic: AND
  objectstore: bluestore
osd_id_claims:
  nautilus2: ['7']
---snip---

I see these lines in the mgr.log:

Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log [INF] : Found osd claims -> {'nautilus2': ['7']} Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO cephadm.services.osd] Found osd claims for drivegroup None -> {'nautilus2': ['7']} Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']}

But I see no attempt to actually deploy the OSD.

[2] https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace

Zitat von Adam King <adking@xxxxxxxxxx>:

For reference, a stray daemon from cephadm POV is roughly just something
that shows up in "ceph node ls" that doesn't have a directory in
/var/lib/ceph/<fsid>. I guess manually making the OSD as you did means that
didn't end up getting made. I remember the manual osd creation process (by
manual just meaning not using an orchestrator/cephadm mgr module command)
coming up at one point and the we ended up manually running "cephadm
deploy" to make sure those directories get created correctly, but I don't
think any docs ever got made about it (yet, anyway). Also, is there a
tracker issue for it not correctly handling the drivegroup?

On Mon, Feb 20, 2023 at 8:58 AM Eugen Block <eblock@xxxxxx> wrote:

Thanks, Adam.

Providing the keyring to the cephadm command worked, but the unwanted
(but expected) side effect is that from cephadm perspective it's a
stray daemon. For some reason the orchestrator did apply the desired
drivegroup when I tried to reproduce this morning, but then again
failed just now when I wanted to get rid of the stray daemon. This is
one of the most annoying things with cephadm, I still don't fully
understand when it will correctly apply the identical drivegroup.yml
and when not. Anyway, the conclusion is to not interfere with cephadm
(nothing new here), but since the drivegroup was not applied correctly
I assumed I had to "help out" a bit by manually deploying an OSD.

Thanks,
Eugen

Zitat von Adam King <adking@xxxxxxxxxx>:

> Going off of
>
> ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> you could try passing "--keyring <bootstrap-osd-keyring" to the cephadm
> ceph-volume command. Something like  'cephadm ceph-volume --keyring
> <bootstrap-osd-keyring> -- lvm create'. I'm guessing it's trying to run
the
> osd tree command within a container and I know cephadm mounts keyrings
> passed to the ceph-volume command as
> "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
>
> On Mon, Feb 20, 2023 at 6:35 AM Eugen Block <eblock@xxxxxx> wrote:
>
>> Hi *,
>>
>> I was playing around on an upgraded test cluster (from N to Q),
>> current version:
>>
>>      "overall": {
>>          "ceph version 17.2.5
>> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
>>      }
>>
>> I tried to replace an OSD after destroying it with 'ceph orch osd rm
>> osd.5 --replace'. The OSD was drained successfully and marked as
>> "destroyed" as expected, the zapping also worked. At this point I
>> didn't have an osd spec in place because all OSDs were adopted during
>> the upgrade process. So I created a new spec which was not applied
>> successfully (I'm wondering if there's another/new issue with
>> ceph-volume, but that's not the focus here), so I tried it manually
>> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
>> for a better readability. Apparently, there's no boostrap-osd keyring
>> for cephadm so it can't search the desired osd_id in the osd tree, the
>> command it tries is this:
>>
>> ceph --cluster ceph --name client.bootstrap-osd --keyring
>> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>>
>> In the local filesystem the required keyring is present, though:
>>
>> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
>> [client.bootstrap-osd]
>>          key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
>>          caps mgr = "allow r"
>>          caps mon = "profile bootstrap-osd"
>>
>> Is there something missing during the adoption process? Or are the
>> docs lacking some upgrade info? I found a section about putting
>> keyrings under management [1], but I'm not sure if that's what's
>> missing here.
>> Any insights are highly appreciated!
>>
>> Thanks,
>> Eugen
>>
>> [1]
>>
>>
https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management
>>
>>
>> ---snip---
>> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
>> --block.db /dev/sdb --block.db-size 5G
>> Inferring fsid <FSID>
>> Using recent ceph image
>> <LOCAL_REGISTRY>/ceph/ceph@sha256
>> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
>> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
>> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
>> --init -e
>> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
>> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
>> /var/run/ceph/<FSID>:/var/run/ceph:z -v
>> /var/log/ceph/<FSID>:/var/log/ceph:z -v
>> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
>> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
>> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
>> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
>> <LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
--block.db-size
>> 5G
>> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
>> msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\"
>> doesn't exist, skipping"
>> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
>> msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from
>> \"/etc/containers/mounts.conf\" doesn't exist, skipping"
>> /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool
>> --gen-print-key
>> /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph
>> --name client.bootstrap-osd --keyring
>> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.848+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
>> (2) No such file or
>> directory
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.848+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
>>
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
>> disabling
>> cephx
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.852+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.852+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
>> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd250065910) no keyring found at
>> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 auth: unable to find a keyring on
>> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
>> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.856+0000
>> 7fd255e30700 -1 AuthRegistry(0x7fd255e2eea0) no keyring found at
>> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> /usr/bin/podman: stderr  stderr: [errno 2] RADOS object not found
>> (error connecting to the cluster)
>> /usr/bin/podman: stderr Traceback (most recent call last):
>> /usr/bin/podman: stderr   File "/usr/sbin/ceph-volume", line 11, in
>> <module>
>> /usr/bin/podman: stderr     load_entry_point('ceph-volume==1.0.0',
>> 'console_scripts', 'ceph-volume')()
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in
>> __init__
>> /usr/bin/podman: stderr     self.main(self.argv)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59,
>> in newfunc
>> /usr/bin/podman: stderr     return f(*a, **kw)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in
>> main
>> /usr/bin/podman: stderr     terminal.dispatch(self.mapper,
subcommand_args)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194,
>> in dispatch
>> /usr/bin/podman: stderr     instance.main()
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py",
>> line 46, in main
>> /usr/bin/podman: stderr     terminal.dispatch(self.mapper, self.argv)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194,
>> in dispatch
>> /usr/bin/podman: stderr     instance.main()
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
>> line 77, in main
>> /usr/bin/podman: stderr     self.create(args)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16,
>> in is_root
>> /usr/bin/podman: stderr     return func(*a, **kw)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
>> line 26, in create
>> /usr/bin/podman: stderr     prepare_step.safe_prepare(args)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
>> line 252, in safe_prepare
>> /usr/bin/podman: stderr     self.prepare()
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16,
>> in is_root
>> /usr/bin/podman: stderr     return func(*a, **kw)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py",
>> line 292, in prepare
>> /usr/bin/podman: stderr     self.osd_id =
>> prepare_utils.create_id(osd_fsid, json.dumps(secrets),
>> osd_id=self.args.osd_id)
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line
>> 166, in create_id
>> /usr/bin/podman: stderr     if osd_id_available(osd_id):
>> /usr/bin/podman: stderr   File
>> "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line
>> 204, in osd_id_available
>> /usr/bin/podman: stderr     raise RuntimeError('Unable check if OSD id
>> exists: %s' % osd_id)
>> /usr/bin/podman: stderr RuntimeError: Unable check if OSD id exists: 5
>> Traceback (most recent call last):
>>    File "/usr/sbin/cephadm", line 9170, in <module>
>>      main()
>>    File "/usr/sbin/cephadm", line 9158, in main
>>      r = ctx.func(ctx)
>>    File "/usr/sbin/cephadm", line 1917, in _infer_config
>>      return func(ctx)
>>    File "/usr/sbin/cephadm", line 1877, in _infer_fsid
>>      return func(ctx)
>>    File "/usr/sbin/cephadm", line 1945, in _infer_image
>>      return func(ctx)
>>    File "/usr/sbin/cephadm", line 1835, in _validate_fsid
>>      return func(ctx)
>>    File "/usr/sbin/cephadm", line 5294, in command_ceph_volume
>>      out, err, code = call_throws(ctx, c.run_cmd())
>>    File "/usr/sbin/cephadm", line 1637, in call_throws
>>      raise RuntimeError('Failed command: %s' % ' '.join(command))
>> RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host
>> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
>> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
>> --init -e
>> CONTAINER_IMAGE=<LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
>> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
>> /var/run/ceph/<FSID>:/var/run/ceph:z -v
>> /var/log/ceph/<FSID>:/var/log/ceph:z -v
>> /var/lib/ceph/<FSID>/crash:/var/lib/ceph/crash:z -v /dev:/dev -v
>> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
>> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
>> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
>> <LOCAL_REGISTRY>/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb
--block.db-size
>> 5G
>> ---snip---
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux