Re: [v18.2.1] problem with wrong osd device symlinks after upgrade to 18.2.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, I found the problem I think.
The problem is that the LVM osd with the LVM raid1 block.db is activated by
RAWActivate instead of LVMActivate, which I think is wrong.

furthermore if /dev/optante/ceph-db-osd1 is a raid1 LV,
ceph_volume.device.raw.list reports:
>>> foo = raw_list.direct_report(dev)
('ignoring child device /dev/mapper/optane-ceph--db--osd1 whose parent
/dev/dm-51 is a BlueStore OSD.', "device is likely a phantom Atari
partition. device info: {'NAME': '/dev/mapper/optane-ceph--db--osd1',
'KNAME': '/dev/dm-13', 'PKNAME': '/
dev/dm-51', 'MAJ:MIN': '254:13', 'FSTYPE': 'ceph_bluestore', 'MOUNTPOINT':
'', 'LABEL': '', 'UUID': '', 'RO': '0', 'RM': '0', 'MODEL': '', 'SIZE':
'50G', 'STATE': 'running', 'OWNER': 'ceph', 'GROUP': 'ceph', 'MODE':
'brw-rw----', 'ALIGNMENT':
'0', 'PHY-SEC': '512', 'LOG-SEC': '512', 'ROTA': '0', 'SCHED': '', 'TYPE':
'lvm', 'DISC-ALN': '0', 'DISC-GRAN': '512B', 'DISC-MAX': '4G', 'DISC-ZERO':
'0', 'PARTLABEL': ''}")
and the block.db is wrong.
>>> print(foo['cdd02721-6876-4db8-bdb2-12ac6c70127c'])
{'osd_uuid': 'cdd02721-6876-4db8-bdb2-12ac6c70127c', 'type': 'bluestore',
'osd_id': 1, 'ceph_fsid': '27923302-87a5-11ec-ac5b-976d21a49941', 'device':
'/dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8-
-bdb2--12ac6c70127c', 'device_db':
'/dev/mapper/optane-ceph--db--osd1_rimage_1'}

one possible fix:
diff --git a/src/ceph-volume/ceph_volume/devices/raw/list.py
b/src/ceph-volume/ceph_volume/devices/raw/list.py
index 794bb18c103..5f874050f7c 100644
--- a/src/ceph-volume/ceph_volume/devices/raw/list.py
+++ b/src/ceph-volume/ceph_volume/devices/raw/list.py
@@ -112,6 +112,9 @@ class List(object):
        result = {}
        logger.debug('inspecting devices: {}'.format(devs))
        for info_device in info_devices:
+            if info_device['TYPE'] == 'lvm':
+                # lvm devices are not raw devices
+                continue
            bs_info = _get_bluestore_info(info_device['NAME'])
            if bs_info is None:
                # None is also returned in the rare event that there is an
issue reading info from


I saw that in commit comments that a previous change like this got
reverted, because of some problems
------------------------------------------------------
commit 401bb755020bc0962a2a8038d626b3bc4ec4fff4
Author: Guillaume Abrioux <gabrioux@xxxxxxx>
Date:   Tue Nov 7 14:39:50 2023 +0100

   ceph-volume: Revert "ceph-volume: fix raw list for lvm devices"

   This reverts commit e5e429617c1c27dcd631171f65d30571e32f7266.
   This commit introduced a regression, see linked tracker for details.

   Fixes: https://tracker.ceph.com/issues/63391

   Signed-off-by: Guillaume Abrioux <gabrioux@xxxxxxx>
   (cherry picked from commit 916a22ef031953056771eceb1f49cab7eb746978)

diff --git a/src/ceph-volume/ceph_volume/devices/raw/list.py
b/src/ceph-volume/ceph_volume/devices/raw/list.py
index 0f801701b80..c86353b90ec 100644
--- a/src/ceph-volume/ceph_volume/devices/raw/list.py
+++ b/src/ceph-volume/ceph_volume/devices/raw/list.py
@@ -69,7 +69,7 @@ class List(object):
    def generate(self, devs=None):
        logger.debug('Listing block devices via lsblk...')
        info_devices = disk.lsblk_all(abspath=True)
-        if devs is None or devs == []:
+        if not devs or not any(devs):
            # If no devs are given initially, we want to list ALL devices
including children and
            # parents. Parent disks with child partitions may be the
appropriate device to return if
            # the parent disk has a bluestore header, but children may be
the most appropriate
@@ -89,9 +89,6 @@ class List(object):
            # determine whether a parent is bluestore, we should err on the
side of not reporting
            # the child so as not to give a false negative.
            info_device = [info for info in info_devices if info['NAME'] ==
dev][0]
-            if info_device['TYPE'] == 'lvm':
-                # lvm devices are not raw devices
-                continue
            if 'PKNAME' in info_device and info_device['PKNAME'] != "":
                parent = info_device['PKNAME']
                try:
------------------------------------------------

Regards,

Reto

Am Sa., 6. Jan. 2024 um 17:22 Uhr schrieb Reto Gysi <rlgysi@xxxxxxxxx>:

> Hi ceph community
>
> I noticed the following problem after upgrading my ceph instance on Debian
> 12.4 from 17.2.7 to 18.2.1:
>
> I had placed bluestore block.db for hdd osd's on raid1/mirrored logical
> volumes on 2 nvme devices, so that if a single block.db nvme device fails,
> that not all hdd osds fail.
> That worked fine under 17.2.7 and had no problems during host/osd restarts.
> During the upgrade to 18.2.1 the osd's wouldn't with the block.db on
> mirrored lv wouldn't start anymore because the block.db symlink was updated
> to pointing to the wrong device mapper device, and the osd startup failed
> with error message that block.db device is busy.
>
> OSD1:
> 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1
> bluestore(/var/lib/ceph/osd/ceph-1) _minimal_open_bluefs add block
> device(/var/lib/ceph/osd/ceph-1/block.db) returned: (16) Device or resource
> busy
> 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1
> bluestore(/var/lib/ceph/osd/ceph-1) _open_db failed to prepare db
> environment:
> 2024-01-05T19:56:43.592+0000 7fdde9f43640  1 bdev(0x55a2d5014000
> /var/lib/ceph/osd/ceph-1/block) close
> 2024-01-05T19:56:43.892+0000 7fdde9f43640 -1 osd.1 0 OSD:init: unable to
> mount object store
>
> the symlink was updated to point to
> lrwxrwxrwx 1 ceph ceph  111 Jan  5 20:57 block ->
> /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c
>
> lrwxrwxrwx 1 ceph ceph   48 Jan  5 20:57 block.db ->
> /dev/mapper/optane-ceph--db--osd1_rimage_1_iorig
>
> the correct symlink would have been:
> lrwxrwxrwx 1 ceph ceph  111 Jan  5 20:57 block ->
> /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c
>
> lrwxrwxrwx 1 ceph ceph   48 Jan  5 20:57 block.db ->
> /dev/mapper/optane-ceph--db--osd1
>
>
> To continue with the upgrade I converted one by one all the block.db lvm
> logical volumes back to linear volumes, and fixed the symlinks manually.
> converting the lv's back to linear was necessary, because even when I
> fixed the symlink manually, after a osd restart the symlink would be
> created wrong again if the block.db would point to a raid1 lv.
>
> Here's any example how the symlink looked before an osd was touched by the
> 18.2.1 upgrade:
> OSD2:
> lrwxrwxrwx 1 ceph ceph   93 Jan  4 03:38 block ->
> /dev/ceph-17a894d6-3a64-4e5e-9fa0-8dd3b5f4bf33/osd-block-3cd7a5af-9002-47a7-b4c2-540381d53be7
>
> lrwxrwxrwx 1 ceph ceph   24 Jan  4 03:38 block.db ->
> /dev/optane/ceph-db-osd2
>
>
> Here's what the output of lvs -a -o +devices looked like for OSD1 block.db
> device when it was an raid1 lv:
>
>   LV                                             VG
>                                        Attr       LSize    Pool Origin
>                         Data%  Meta%  Move Log Cpy%Sync Convert Devices
>
>  ceph-db-osd1                                   optane
>                                    rwi-a-r---   44.00g
>                                                            100.00
>           ceph-db-osd1_rimage_0(0),ceph-db-osd1_rimage_1(0)
>
>  [ceph-db-osd1_rimage_0]                        optane
>                                    gwi-aor---   44.00g
>      [ceph-db-osd1_rimage_0_iorig]                         100.00
>           ceph-db-osd1_rimage_0_iorig(0)
>  [ceph-db-osd1_rimage_0_imeta]                  optane
>                                    ewi-ao----  428.00m
>                                                                             /dev/sdg(55482)
>
>  [ceph-db-osd1_rimage_0_imeta]                  optane
>                                    ewi-ao----  428.00m
>                                                                             /dev/sdg(84566)
>
>  [ceph-db-osd1_rimage_0_iorig]                  optane
>                                    -wi-ao----   44.00g
>                                                                             /dev/sdg(9216)
>
>  [ceph-db-osd1_rimage_0_iorig]                  optane
>                                    -wi-ao----   44.00g
>                                                                             /dev/sdg(82518)
>
>  [ceph-db-osd1_rimage_1]                        optane
>                                    gwi-aor---   44.00g
>      [ceph-db-osd1_rimage_1_iorig]                         100.00
>           ceph-db-osd1_rimage_1_iorig(0)
>  [ceph-db-osd1_rimage_1_imeta]                  optane
>                                    ewi-ao----  428.00m
>                                                                             /dev/sdj(55392)
>
>  [ceph-db-osd1_rimage_1_imeta]                  optane
>                                    ewi-ao----  428.00m
>                                                                             /dev/sdj(75457)
>
>  [ceph-db-osd1_rimage_1_iorig]                  optane
>                                    -wi-ao----   44.00g
>                                                                             /dev/sdj(9218)
>
>  [ceph-db-osd1_rimage_1_iorig]                  optane
>                                    -wi-ao----   44.00g
>                                                                             /dev/sdj(73409)
>
>  [ceph-db-osd1_rmeta_0]                         optane
>                                    ewi-aor---    4.00m
>                                                                             /dev/sdg(55388)
>
>  [ceph-db-osd1_rmeta_1]                         optane
>                                    ewi-aor---    4.00m
>                                                                             /dev/sdj(9217)
>
>
>
>
> It would be good if the symlinks were recreated pointing to the correct
> device even when they point to a raid1 lv.
> Not sure if this problem has been reported yet.
>
>
> Cheers
>
> Reto
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux