CEPH Quincy installation with multipathd enabled

youssef.khristo@xxxxxxxxxxxxxxxxx · Tue, 02 Apr 2024 10:24:34 -0000

Greetings community,

we have a setup comprising of 6 servers hosting CentOS 8 Minimal Installation with CEPH Quincy version 18.2.2 supported by 20Gbps fiber optics NICs and a dual Xeon Intel processors, bootstrapped the installation on the first node then expanded to the others using the cephadm method, having the monitor services deployed on 5 of these nodes as well as 3 manager nodes. Each server has an NVMe boot disk as well as a 1TBs SATA SSD over which the OSDs are deployed. An EC profile was created with k=3 and m=3, serving a CephFS filesystem on top with NFS exports to serve other servers. Up to this point, the setup is quite stable in the sense that upon emergency reboot or network connection failure the OSDs did not fail and remain functional/started normally after reboot.

At a certain point in our project, we had the need to activate the multipathd service, adding the boot drive partition and the CEPH SSD to its blacklist as to not be initialized for use by an mpath partition, the blacklist goes like so:

boot blacklist:
===============
blacklist {
    wwid "eui.<drive_id>"
}

SATA SSD blacklist:
===================
blacklist {
    wwid "naa.<drive_id>"
}

The above blacklist configuration ensures that both the boot disk as well as CEPH's OSD function properly, with the following being lsblk output:

NAME                                                                                                  	  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     	8:0    0 894.3G  0 disk
└─ceph--<id>-osd--block--<block_id> 																	  252:3    0 894.3G  0 lvm
nvme0n1                                                                                               	  259:0    0 238.5G  0 disk
├─nvme0n1p1                                                                                           	  259:1    0   600M  0 part /boot/efi
├─nvme0n1p2                                                                                           	  259:2    0     1G  0 part /boot
└─nvme0n1p3                                                                                           	  259:3    0 236.9G  0 part
  ├─centos-root                                                                                           252:0    0   170G  0 lvm  /
  ├─centos-swap                                                                                           252:1    0  23.4G  0 lvm  [SWAP]
  ├─centos-var_log_audit                                                                                  252:2    0   7.5G  0 lvm  /var/log/audit
  ├─centos-home                                                                                           252:4    0    26G  0 lvm  /home
  └─centos-var_log                                                                                        252:5    0    10G  0 lvm  /var/log

In addition to the above multipathd configuration, we have use_devicesfile=1 in /etc/lvm/lvm.conf, with /etc/lvm/devices/system.devices file being like so, with PVID used from the output of the pvdisplay command, and the IDNAME value extracted from the ouput of "ls -lha /dev/disk/by-id":

VERSION=1.1.1
IDTYPE=sys_wwid IDNAME=eui.<drive_id> DEVNAME=/dev/nvme0n1p3 PVID=<pvid> PART=3
IDTYPE=sys_wwid IDNAME=naa.<drive_id> DEVNAME=/dev/sda PVID=<pvid>

Issues started when performing certain tests regarding the system's integrity, most important of which is emergency shutdown's and reboot of all the nodes, the behavior that follows is that the OSDs are not started automatically as well as their respective LVM volumes not properly showing (except on a single node for some reason), hence the lsblk ouput changes like the snippet below, requiring us rebooting the nodes one by one until all the OSDs are back online:

NAME                                                                                                  	  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     	8:0    0 894.3G  0 disk
nvme0n1                                                                                               	  259:0    0 238.5G  0 disk
├─nvme0n1p1                                                                                           	  259:1    0   600M  0 part /boot/efi
├─nvme0n1p2                                                                                           	  259:2    0     1G  0 part /boot
└─nvme0n1p3                                                                                           	  259:3    0 236.9G  0 part
  ├─centos-root                                                                                           252:0    0   170G  0 lvm  /
  ├─centos-swap                                                                                           252:1    0  23.4G  0 lvm  [SWAP]
  ├─centos-var_log_audit                                                                                  252:2    0   7.5G  0 lvm  /var/log/audit
  ├─centos-home                                                                                           252:4    0    26G  0 lvm  /home
  └─centos-var_log                                                                                        252:5    0    10G  0 lvm  /var/log

Without the LVM configuration and the multipathd service enabled, everything works fine, this behavior started happening after the changes. Attempting to manually restart the OSDs from a manager node using ceph orch daemon restart osd.n results in an error state, and even when manually starting the OSD on each node via the bash /var/lib/ceph/<fsid>/osd.0/unit.run we receive the following error:

--> Failed to activate via raw: did not find any matching OSD to activate
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-<id>/osd-block-<block_id> --path /var/lib/ceph/osd/ceph-0 --no-mon-config
 stderr: failed to read label for /dev/ceph-<id>/osd-block-<block_id>: (2) No such file or directory
2024-03-30T12:42:54.014+0000 7f845296a980 -1 bluestore(/dev/ceph-<id>/osd-block-<block_id>) _read_bdev_label failed to open /dev/ceph-<id>/osd-block-<block_id>: (2) No such file or directory
--> Failed to activate via LVM: command returned non-zero exit status: 1
--> Failed to activate via simple: 'Namespace' object has no attribute 'json_config'
--> Failed to activate any OSD(s)

A successful run of the same command but with success results in the following output:

/bin/bash /var/lib/ceph/<fsid>/osd.0/unit.run
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-0 --no-mon-config --dev /dev/mapper/ceph--<id>-osd--block--<block_id>
Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--<id>-osd--block--<block_id>
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5
Running command: /usr/bin/ln -s /dev/mapper/ceph--<id>-osd--block--<block_id> /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
--> ceph-volume raw activate successful for osd ID: 0
ceph-<fsid>-osd-0
4361e2f166bcdeee6e9020dcbb153d3d7eec04e71d5b0b250440d4a3a0833f2c

It seems to us as if the logical volume in cases of failure is not even detected at boot by the device-mapper, which is weird, it's also not showing in the output of the dmsetup ls command in cases of failure. What could we be missing here? What seems to be the conflict between CEPH OSDs and the multipathd service or even the LVM configuration? Should the system.devices entry be different than what we set? Is the multipathd blacklisting configuration missing something? We have been working on trial and error experiments for more than a week now and looked at the lvm2 as well as multipathd logs (we could provide them upon request) but to no avail as nothing indicates any errors, just normal logs with the difference being the missing CEPH OSD LVM volume.

Best regards
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx