Re: cephadm does not redeploy OSD

Luis Domingues <luis.domingues@xxxxxxxxx> · Thu, 20 Jul 2023 05:59:37 +0000

Here you have.

So on the log when cephadm gets the inventory:

Found inventory for host [Device(path=/....
Device(path=/dev/nvme2n1, lvs=[{'cluster_fsid': '11b47c57-5e7f-44c0-8b19-ddd801a89435', 'cluster_name': 'ceph', 'db_uuid': 'irQUVH-txAO-fh3p-tkEj-ZoAH-p7lI-HcHOJp', 'name': 'osd-db-75f820d1-1597-4894-88d5-e1f21e0425a6', 'osd_fsid': '1abbad8e-9053-4335-8673-7f1c7832b7b0', 'osd_id': '35', 'osdspec_affinity': 'spec-a', 'type': 'db'}, ....

And then when I have the list of disks, with different filters and telling if the disk is taken or not:

[DBG] : /dev/sde. is already used in spec spec-a, skipping it.

But I could confirm that globally the nvme for db was taken into account.
Found drive selection DeviceSelection(data devices=['/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdj', '/dev/sdj', '/dev/sdk', '/dev/sdl'], wal_devices=[], db devices=['/dev/nvme2n1'], journal devices=[]

And then I saw cephadm apply spec on all nodes except:

skipping apply of node05 on DriveGroupSpec.from_json(yaml.safe_load('''service_type:
 ---
 service_type: osd
 service_id: spec-b
 placement:
 label: osds
 spec:
 data_devices:
 rotational: 1
 encrypted: true
 db_devices:

 size: '1TB:2TB' db_slots: 1
''')) (no change)

And now I can see both my disks on spec-b when cephadm check the disk inventory:

[DBG] : /dev/sde is already used in spec spec-b, skipping it.
...
[DBG] : /dev/sdk is already used in spec spec-b, skipping it.

As I said in my previous e-mail, I am not sure this was the reason why, as I did not found any clear messages saying the db_device was ignored. And I did not tried to replicate this behavior yet.
So yeah, I fixed my issue, but not sure if I it was just luck or not.

Luis Domingues
Proton AG

------- Original Message -------
On Wednesday, July 19th, 2023 at 22:04, Adam King <adking@xxxxxxxxxx> wrote:

> > When looking on the very verbous cephadm logs, it seemed that cephadm was
> > just skipping my node, with a message saying that a node was already part
> > of another spec.
>
>
> If you have it, would you mind sharing what this message was? I'm still not
> totally sure what happened here.
>
> On Wed, Jul 19, 2023 at 10:15 AM Luis Domingues luis.domingues@xxxxxxxxx
>
> wrote:
>
> > So good news, I was not hit by the bug you mention on this thread.
> >
> > What happened, (apparently, I did not tried to replicated it yet) is that
> > I had another OSD (let call it OSD.1) using the db device, but that was
> > part of an old spec. (let call it spec-a). And the OSD (OSD.2) I removed
> > should be detected as part of spec-b. The difference between them was just
> > the name and the placement, using labels instead of hostname.
> >
> > When looking on the very verbous cephadm logs, it seemed that cephadm was
> > just skipping my node, with a message saying that a node was already part
> > of another spec.
> >
> > I purged OSD.1 with --replace and --zap, and once disks where empty and
> > ready to go, cephamd just added back OSD.1 and OSD.2 using the db_device as
> > specified.
> >
> > I do not know if this is the intended behavior, or if I was just lucky,
> > but all my OSDs are back to the cluster.
> >
> > Luis Domingues
> > Proton AG
> >
> > ------- Original Message -------
> > On Tuesday, July 18th, 2023 at 18:32, Luis Domingues <
> > luis.domingues@xxxxxxxxx> wrote:
> >
> > > That part looks quite good:
> > >
> > > "available": false,
> > > "ceph_device": true,
> > > "created": "2023-07-18T16:01:16.715487Z",
> > > "device_id": "SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600354",
> > > "human_readable_type": "ssd",
> > > "lsm_data": {},
> > > "lvs": [
> > > {
> > > "cluster_fsid": "11b47c57-5e7f-44c0-8b19-ddd801a89435",
> > > "cluster_name": "ceph",
> > > "db_uuid": "CUMgp7-Uscn-ASLo-bh14-7Sxe-80GE-EcywDb",
> > > "name": "osd-block-db-5cb8edda-30f9-539f-b4c5-dbe420927911",
> > > "osd_fsid": "089894cf-1782-4a3a-8ac0-9dd043f80c71",
> > > "osd_id": "7",
> > > "osdspec_affinity": "",
> > > "type": "db"
> > > },
> > > {
> > >
> > > I forgot to mention that the cluster was initially deployed with
> > > ceph-ansible and adopted by cephadm.
> > >
> > > Luis Domingues
> > > Proton AG
> > >
> > > ------- Original Message -------
> > > On Tuesday, July 18th, 2023 at 18:15, Adam King adking@xxxxxxxxxx wrote:
> > >
> > > > in the "ceph orch device ls --format json-pretty" output, in the blob
> > > > for
> > > > that specific device, is the "ceph_device" field set? There was a bug
> > > > where
> > > > it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and
> > > > it
> > > > would make it so you couldn't use a device serving as a db device for
> > > > any
> > > > further OSDs, unless the device was fully cleaned out (so it is no
> > > > longer
> > > > serving as a db device). The "ceph_device" field is meant to be our
> > > > way of
> > > > knowing "yes there are LVM partitions here, but they're our partitions
> > > > for
> > > > ceph stuff, so we can still use the device" and without it (or with it
> > > > just
> > > > being broken, as in the tracker) redeploying OSDs that used the device
> > > > for
> > > > its DB wasn't working as we don't know if those LVs imply its our
> > > > device or
> > > > has LVs for some other purpose. I had thought this was fixed already in
> > > > 16.2.13 but it sounds too similar to what you're seeing not to
> > > > consider it.
> > > >
> > > > On Tue, Jul 18, 2023 at 10:53 AM Luis Domingues
> > > > luis.domingues@xxxxxxxxx
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We are running a ceph cluster managed with cephadm v16.2.13.
> > > > > Recently we
> > > > > needed to change a disk, and we replaced it with:
> > > > >
> > > > > ceph orch osd rm 37 --replace.
> > > > >
> > > > > It worked fine, the disk was drained and the OSD marked as destroy.
> > > > >
> > > > > However, after changing the disk, no OSD was created. Looking to the
> > > > > db
> > > > > device, the partition for db for OSD 37 was still there. So we
> > > > > destroyed it
> > > > > using:
> > > > > ceph-volume lvm zap --osd-id=37 --destroy.
> > > > >
> > > > > But we still have no OSD redeployed.
> > > > > Here we have our spec:
> > > > >
> > > > > ---
> > > > > service_type: osd
> > > > > service_id: osd-hdd
> > > > > placement:
> > > > > label: osds
> > > > > spec:
> > > > > data_devices:
> > > > > rotational: 1
> > > > > encrypted: true
> > > > > db_devices:
> > > > > size: '1TB:2TB' db_slots: 12
> > > > >
> > > > > And the disk looks good:
> > > > >
> > > > > HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
> > > > > node05 /dev/nvme2n1 ssd SAMSUNG MZPLJ1T6HBJR-00007_S55JNG0R600357
> > > > > 1600G
> > > > > 12m ago LVM detected, locked
> > > > >
> > > > > node05 /dev/sdk hdd SEAGATE_ST10000NM0206_ZA21G2170000C7240KPF 10.0T
> > > > > Yes
> > > > > 12m ago
> > > > >
> > > > > And VG on db_device looks to have enough space:
> > > > > ceph-33b06f1a-f6f6-57cf-9ca8-6e4aa81caae0 1 11 0 wz--n- <1.46t
> > > > > 173.91g
> > > > >
> > > > > If I remove the db_devices and db_slots from the specs, and do a dry
> > > > > run,
> > > > > the orchestrator seems to see the new disk as available:
> > > > >
> > > > > ceph orch apply -i osd_specs.yml --dry-run
> > > > > WARNING! Dry-Runs are snapshots of a certain point in time and are
> > > > > bound
> > > > > to the current inventory setup. If any of these conditions change,
> > > > > the
> > > > > preview will be invalid. Please make sure to have a minimal
> > > > > timeframe between planning and applying the specs.
> > > > > ####################
> > > > > SERVICESPEC PREVIEWS
> > > > > ####################
> > > > > +---------+------+--------+-------------+
> > > > > |SERVICE |NAME |ADD_TO |REMOVE_FROM |
> > > > > +---------+------+--------+-------------+
> > > > > +---------+------+--------+-------------+
> > > > > ################
> > > > > OSDSPEC PREVIEWS
> > > > > ################
> > > > > +---------+---------+-------------------------+----------+----+-----+
> > > > > |SERVICE |NAME |HOST |DATA |DB |WAL |
> > > > > +---------+---------+-------------------------+----------+----+-----+
> > > > > |osd |osd-hdd |node05 |/dev/sdk |- |- |
> > > > > +---------+---------+-------------------------+----------+----+-----+
> > > > >
> > > > > But as soon as I add db_devices back, the orchestrator is happy as
> > > > > it is,
> > > > > like there is nothing to do:
> > > > >
> > > > > ceph orch apply -i osd_specs.yml --dry-run
> > > > > WARNING! Dry-Runs are snapshots of a certain point in time and are
> > > > > bound
> > > > > to the current inventory setup. If any of these conditions change,
> > > > > the
> > > > > preview will be invalid. Please make sure to have a minimal
> > > > > timeframe between planning and applying the specs.
> > > > > ####################
> > > > > SERVICESPEC PREVIEWS
> > > > > ####################
> > > > > +---------+------+--------+-------------+
> > > > > |SERVICE |NAME |ADD_TO |REMOVE_FROM |
> > > > > +---------+------+--------+-------------+
> > > > > +---------+------+--------+-------------+
> > > > > ################
> > > > > OSDSPEC PREVIEWS
> > > > > ################
> > > > > +---------+------+------+------+----+-----+
> > > > > |SERVICE |NAME |HOST |DATA |DB |WAL |
> > > > > +---------+------+------+------+----+-----+
> > > > >
> > > > > I do not know why ceph will not use this disk, and I do not know
> > > > > where to
> > > > > look. It seems logs are not saying anything. And the weirdest thing,
> > > > > another disk was replaced on the same machine, and it went without
> > > > > any
> > > > > issues.
> > > > >
> > > > > Luis Domingues
> > > > > Proton AG
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > >
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx