Hi ceph community I noticed the following problem after upgrading my ceph instance on Debian 12.4 from 17.2.7 to 18.2.1: I had placed bluestore block.db for hdd osd's on raid1/mirrored logical volumes on 2 nvme devices, so that if a single block.db nvme device fails, that not all hdd osds fail. That worked fine under 17.2.7 and had no problems during host/osd restarts. During the upgrade to 18.2.1 the osd's wouldn't with the block.db on mirrored lv wouldn't start anymore because the block.db symlink was updated to pointing to the wrong device mapper device, and the osd startup failed with error message that block.db device is busy. OSD1: 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1 bluestore(/var/lib/ceph/osd/ceph-1) _minimal_open_bluefs add block device(/var/lib/ceph/osd/ceph-1/block.db) returned: (16) Device or resource busy 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1 bluestore(/var/lib/ceph/osd/ceph-1) _open_db failed to prepare db environment: 2024-01-05T19:56:43.592+0000 7fdde9f43640 1 bdev(0x55a2d5014000 /var/lib/ceph/osd/ceph-1/block) close 2024-01-05T19:56:43.892+0000 7fdde9f43640 -1 osd.1 0 OSD:init: unable to mount object store the symlink was updated to point to lrwxrwxrwx 1 ceph ceph 111 Jan 5 20:57 block -> /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c lrwxrwxrwx 1 ceph ceph 48 Jan 5 20:57 block.db -> /dev/mapper/optane-ceph--db--osd1_rimage_1_iorig the correct symlink would have been: lrwxrwxrwx 1 ceph ceph 111 Jan 5 20:57 block -> /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c lrwxrwxrwx 1 ceph ceph 48 Jan 5 20:57 block.db -> /dev/mapper/optane-ceph--db--osd1 To continue with the upgrade I converted one by one all the block.db lvm logical volumes back to linear volumes, and fixed the symlinks manually. converting the lv's back to linear was necessary, because even when I fixed the symlink manually, after a osd restart the symlink would be created wrong again if the block.db would point to a raid1 lv. Here's any example how the symlink looked before an osd was touched by the 18.2.1 upgrade: OSD2: lrwxrwxrwx 1 ceph ceph 93 Jan 4 03:38 block -> /dev/ceph-17a894d6-3a64-4e5e-9fa0-8dd3b5f4bf33/osd-block-3cd7a5af-9002-47a7-b4c2-540381d53be7 lrwxrwxrwx 1 ceph ceph 24 Jan 4 03:38 block.db -> /dev/optane/ceph-db-osd2 Here's what the output of lvs -a -o +devices looked like for OSD1 block.db device when it was an raid1 lv: LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices ceph-db-osd1 optane rwi-a-r--- 44.00g 100.00 ceph-db-osd1_rimage_0(0),ceph-db-osd1_rimage_1(0) [ceph-db-osd1_rimage_0] optane gwi-aor--- 44.00g [ceph-db-osd1_rimage_0_iorig] 100.00 ceph-db-osd1_rimage_0_iorig(0) [ceph-db-osd1_rimage_0_imeta] optane ewi-ao---- 428.00m /dev/sdg(55482) [ceph-db-osd1_rimage_0_imeta] optane ewi-ao---- 428.00m /dev/sdg(84566) [ceph-db-osd1_rimage_0_iorig] optane -wi-ao---- 44.00g /dev/sdg(9216) [ceph-db-osd1_rimage_0_iorig] optane -wi-ao---- 44.00g /dev/sdg(82518) [ceph-db-osd1_rimage_1] optane gwi-aor--- 44.00g [ceph-db-osd1_rimage_1_iorig] 100.00 ceph-db-osd1_rimage_1_iorig(0) [ceph-db-osd1_rimage_1_imeta] optane ewi-ao---- 428.00m /dev/sdj(55392) [ceph-db-osd1_rimage_1_imeta] optane ewi-ao---- 428.00m /dev/sdj(75457) [ceph-db-osd1_rimage_1_iorig] optane -wi-ao---- 44.00g /dev/sdj(9218) [ceph-db-osd1_rimage_1_iorig] optane -wi-ao---- 44.00g /dev/sdj(73409) [ceph-db-osd1_rmeta_0] optane ewi-aor--- 4.00m /dev/sdg(55388) [ceph-db-osd1_rmeta_1] optane ewi-aor--- 4.00m /dev/sdj(9217) It would be good if the symlinks were recreated pointing to the correct device even when they point to a raid1 lv. Not sure if this problem has been reported yet. Cheers Reto _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx