Ok, I found the problem I think. The problem is that the LVM osd with the LVM raid1 block.db is activated by RAWActivate instead of LVMActivate, which I think is wrong. furthermore if /dev/optante/ceph-db-osd1 is a raid1 LV, ceph_volume.device.raw.list reports: >>> foo = raw_list.direct_report(dev) ('ignoring child device /dev/mapper/optane-ceph--db--osd1 whose parent /dev/dm-51 is a BlueStore OSD.', "device is likely a phantom Atari partition. device info: {'NAME': '/dev/mapper/optane-ceph--db--osd1', 'KNAME': '/dev/dm-13', 'PKNAME': '/ dev/dm-51', 'MAJ:MIN': '254:13', 'FSTYPE': 'ceph_bluestore', 'MOUNTPOINT': '', 'LABEL': '', 'UUID': '', 'RO': '0', 'RM': '0', 'MODEL': '', 'SIZE': '50G', 'STATE': 'running', 'OWNER': 'ceph', 'GROUP': 'ceph', 'MODE': 'brw-rw----', 'ALIGNMENT': '0', 'PHY-SEC': '512', 'LOG-SEC': '512', 'ROTA': '0', 'SCHED': '', 'TYPE': 'lvm', 'DISC-ALN': '0', 'DISC-GRAN': '512B', 'DISC-MAX': '4G', 'DISC-ZERO': '0', 'PARTLABEL': ''}") and the block.db is wrong. >>> print(foo['cdd02721-6876-4db8-bdb2-12ac6c70127c']) {'osd_uuid': 'cdd02721-6876-4db8-bdb2-12ac6c70127c', 'type': 'bluestore', 'osd_id': 1, 'ceph_fsid': '27923302-87a5-11ec-ac5b-976d21a49941', 'device': '/dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8- -bdb2--12ac6c70127c', 'device_db': '/dev/mapper/optane-ceph--db--osd1_rimage_1'} one possible fix: diff --git a/src/ceph-volume/ceph_volume/devices/raw/list.py b/src/ceph-volume/ceph_volume/devices/raw/list.py index 794bb18c103..5f874050f7c 100644 --- a/src/ceph-volume/ceph_volume/devices/raw/list.py +++ b/src/ceph-volume/ceph_volume/devices/raw/list.py @@ -112,6 +112,9 @@ class List(object): result = {} logger.debug('inspecting devices: {}'.format(devs)) for info_device in info_devices: + if info_device['TYPE'] == 'lvm': + # lvm devices are not raw devices + continue bs_info = _get_bluestore_info(info_device['NAME']) if bs_info is None: # None is also returned in the rare event that there is an issue reading info from I saw that in commit comments that a previous change like this got reverted, because of some problems ------------------------------------------------------ commit 401bb755020bc0962a2a8038d626b3bc4ec4fff4 Author: Guillaume Abrioux <gabrioux@xxxxxxx> Date: Tue Nov 7 14:39:50 2023 +0100 ceph-volume: Revert "ceph-volume: fix raw list for lvm devices" This reverts commit e5e429617c1c27dcd631171f65d30571e32f7266. This commit introduced a regression, see linked tracker for details. Fixes: https://tracker.ceph.com/issues/63391 Signed-off-by: Guillaume Abrioux <gabrioux@xxxxxxx> (cherry picked from commit 916a22ef031953056771eceb1f49cab7eb746978) diff --git a/src/ceph-volume/ceph_volume/devices/raw/list.py b/src/ceph-volume/ceph_volume/devices/raw/list.py index 0f801701b80..c86353b90ec 100644 --- a/src/ceph-volume/ceph_volume/devices/raw/list.py +++ b/src/ceph-volume/ceph_volume/devices/raw/list.py @@ -69,7 +69,7 @@ class List(object): def generate(self, devs=None): logger.debug('Listing block devices via lsblk...') info_devices = disk.lsblk_all(abspath=True) - if devs is None or devs == []: + if not devs or not any(devs): # If no devs are given initially, we want to list ALL devices including children and # parents. Parent disks with child partitions may be the appropriate device to return if # the parent disk has a bluestore header, but children may be the most appropriate @@ -89,9 +89,6 @@ class List(object): # determine whether a parent is bluestore, we should err on the side of not reporting # the child so as not to give a false negative. info_device = [info for info in info_devices if info['NAME'] == dev][0] - if info_device['TYPE'] == 'lvm': - # lvm devices are not raw devices - continue if 'PKNAME' in info_device and info_device['PKNAME'] != "": parent = info_device['PKNAME'] try: ------------------------------------------------ Regards, Reto Am Sa., 6. Jan. 2024 um 17:22 Uhr schrieb Reto Gysi <rlgysi@xxxxxxxxx>: > Hi ceph community > > I noticed the following problem after upgrading my ceph instance on Debian > 12.4 from 17.2.7 to 18.2.1: > > I had placed bluestore block.db for hdd osd's on raid1/mirrored logical > volumes on 2 nvme devices, so that if a single block.db nvme device fails, > that not all hdd osds fail. > That worked fine under 17.2.7 and had no problems during host/osd restarts. > During the upgrade to 18.2.1 the osd's wouldn't with the block.db on > mirrored lv wouldn't start anymore because the block.db symlink was updated > to pointing to the wrong device mapper device, and the osd startup failed > with error message that block.db device is busy. > > OSD1: > 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1 > bluestore(/var/lib/ceph/osd/ceph-1) _minimal_open_bluefs add block > device(/var/lib/ceph/osd/ceph-1/block.db) returned: (16) Device or resource > busy > 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1 > bluestore(/var/lib/ceph/osd/ceph-1) _open_db failed to prepare db > environment: > 2024-01-05T19:56:43.592+0000 7fdde9f43640 1 bdev(0x55a2d5014000 > /var/lib/ceph/osd/ceph-1/block) close > 2024-01-05T19:56:43.892+0000 7fdde9f43640 -1 osd.1 0 OSD:init: unable to > mount object store > > the symlink was updated to point to > lrwxrwxrwx 1 ceph ceph 111 Jan 5 20:57 block -> > /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c > > lrwxrwxrwx 1 ceph ceph 48 Jan 5 20:57 block.db -> > /dev/mapper/optane-ceph--db--osd1_rimage_1_iorig > > the correct symlink would have been: > lrwxrwxrwx 1 ceph ceph 111 Jan 5 20:57 block -> > /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c > > lrwxrwxrwx 1 ceph ceph 48 Jan 5 20:57 block.db -> > /dev/mapper/optane-ceph--db--osd1 > > > To continue with the upgrade I converted one by one all the block.db lvm > logical volumes back to linear volumes, and fixed the symlinks manually. > converting the lv's back to linear was necessary, because even when I > fixed the symlink manually, after a osd restart the symlink would be > created wrong again if the block.db would point to a raid1 lv. > > Here's any example how the symlink looked before an osd was touched by the > 18.2.1 upgrade: > OSD2: > lrwxrwxrwx 1 ceph ceph 93 Jan 4 03:38 block -> > /dev/ceph-17a894d6-3a64-4e5e-9fa0-8dd3b5f4bf33/osd-block-3cd7a5af-9002-47a7-b4c2-540381d53be7 > > lrwxrwxrwx 1 ceph ceph 24 Jan 4 03:38 block.db -> > /dev/optane/ceph-db-osd2 > > > Here's what the output of lvs -a -o +devices looked like for OSD1 block.db > device when it was an raid1 lv: > > LV VG > Attr LSize Pool Origin > Data% Meta% Move Log Cpy%Sync Convert Devices > > ceph-db-osd1 optane > rwi-a-r--- 44.00g > 100.00 > ceph-db-osd1_rimage_0(0),ceph-db-osd1_rimage_1(0) > > [ceph-db-osd1_rimage_0] optane > gwi-aor--- 44.00g > [ceph-db-osd1_rimage_0_iorig] 100.00 > ceph-db-osd1_rimage_0_iorig(0) > [ceph-db-osd1_rimage_0_imeta] optane > ewi-ao---- 428.00m > /dev/sdg(55482) > > [ceph-db-osd1_rimage_0_imeta] optane > ewi-ao---- 428.00m > /dev/sdg(84566) > > [ceph-db-osd1_rimage_0_iorig] optane > -wi-ao---- 44.00g > /dev/sdg(9216) > > [ceph-db-osd1_rimage_0_iorig] optane > -wi-ao---- 44.00g > /dev/sdg(82518) > > [ceph-db-osd1_rimage_1] optane > gwi-aor--- 44.00g > [ceph-db-osd1_rimage_1_iorig] 100.00 > ceph-db-osd1_rimage_1_iorig(0) > [ceph-db-osd1_rimage_1_imeta] optane > ewi-ao---- 428.00m > /dev/sdj(55392) > > [ceph-db-osd1_rimage_1_imeta] optane > ewi-ao---- 428.00m > /dev/sdj(75457) > > [ceph-db-osd1_rimage_1_iorig] optane > -wi-ao---- 44.00g > /dev/sdj(9218) > > [ceph-db-osd1_rimage_1_iorig] optane > -wi-ao---- 44.00g > /dev/sdj(73409) > > [ceph-db-osd1_rmeta_0] optane > ewi-aor--- 4.00m > /dev/sdg(55388) > > [ceph-db-osd1_rmeta_1] optane > ewi-aor--- 4.00m > /dev/sdj(9217) > > > > > It would be good if the symlinks were recreated pointing to the correct > device even when they point to a raid1 lv. > Not sure if this problem has been reported yet. > > > Cheers > > Reto > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx