On Thu, Jun 7, 2018 at 8:40 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > On Thu, Jun 7, 2018 at 6:33 PM Sage Weil <sweil@xxxxxxxxxx> wrote: > > > > On Thu, 7 Jun 2018, Dan van der Ster wrote: > > > > > Wait, we found something!!! > > > > > > > > > > In the 1st 4k on the block we found the block.db pointing at the wrong > > > > > device (/dev/sdc1 instead of /dev/sdaa1) > > > > > > > > > > 00000130 6b 35 79 2b 67 3d 3d 0d 00 00 00 70 61 74 68 5f |k5y+g==....path_| > > > > > 00000140 62 6c 6f 63 6b 2e 64 62 09 00 00 00 2f 64 65 76 |block.db..../dev| > > > > > 00000150 2f 73 64 63 31 05 00 00 00 72 65 61 64 79 05 00 |/sdc1....ready..| > > > > > 00000160 00 00 72 65 61 64 79 06 00 00 00 77 68 6f 61 6d |..ready....whoam| > > > > > 00000170 69 02 00 00 00 34 38 eb c2 d7 d6 00 00 00 00 00 |i....48.........| > > > > > 00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > > > > > > > > > > It is similarly wrong for another broken osd.53 (block.db is /dev/sdc2 > > > > > instead of /dev/sdaa2). > > > > > And for the osds that are running, that block.db is correct! > > > > Also, note that you can fix your OSDs by changing the path to a stable > > name for the same device (/dev/disk/by-partuuid/something?) with > > 'ceph-bluestore-tool set-label-key ...'. > > Good to know, thanks! > I understand your (3) earlier now... Yes, ceph-volume should call > that to fix the OSD if the device changes. FTR the fixes for this issue have been merged into luminous for 12.2.6: https://github.com/ceph/ceph/pull/22716 If someone runs into this before 12.2.6 is out, here's a quick fix: for db in /var/lib/ceph/osd/ceph-*/block.db do dev=`readlink -f $db` osd=`dirname $db` echo "ceph-bluestore-tool set-label-key --key path_block.db --value ${dev} --dev ${osd}/block" done Cheers, Dan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com