On Thu, 7 Jun 2018, Dan van der Ster wrote: > Hi all, > > We have an intermittent issue where bluestore osds sometimes fail to > start after a reboot. > The osds all fail the same way [see 2], failing to open the superblock. > One one particular host, there are 24 osds and 4 SSDs partitioned for > the block.db's. The affected non-starting OSDs all have block.db on > the same ssd (/dev/sdaa). > > The osds are all running 12.2.5 on latest centos 7.5 and were created > by ceph-volume lvm, e.g. see [1]. > > This seems like a permissions or similar issue related to the > ceph-volume tooling. > Any clues how to debug this further? I take it the OSDs start up if you try again? sage > > Thanks! > > Dan > > [1] > > ====== osd.48 ====== > > [block] /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > > type block > osd id 48 > cluster fsid dd535a7e-4647-4bee-853d-f34112615f81 > cluster name ceph > osd fsid 99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > db device /dev/sdaa1 > encrypted 0 > db uuid 3381a121-1c1b-4e45-a986-c1871c363edc > cephx lockbox secret > block uuid FQkRxS-No7X-ajkP-5L3N-K22a-IXg6-QLceZC > block device > /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > crush device class None > > [ db] /dev/sdaa1 > > PARTUUID 3381a121-1c1b-4e45-a986-c1871c363edc > > > > [2] > -11> 2018-06-07 16:12:16.138407 7fba30fb4d80 1 -- - start start > -10> 2018-06-07 16:12:16.138516 7fba30fb4d80 1 > bluestore(/var/lib/ceph/osd/ceph-48) _mount path /var/lib/ceph/os > d/ceph-48 > -9> 2018-06-07 16:12:16.138801 7fba30fb4d80 1 bdev create path > /var/lib/ceph/osd/ceph-48/block type kernel > -8> 2018-06-07 16:12:16.138808 7fba30fb4d80 1 bdev(0x55eb46433a00 > /var/lib/ceph/osd/ceph-48/block) open path /v > ar/lib/ceph/osd/ceph-48/block > -7> 2018-06-07 16:12:16.138999 7fba30fb4d80 1 bdev(0x55eb46433a00 > /var/lib/ceph/osd/ceph-48/block) open size 60 > 01172414464 (0x57541c00000, 5589 GB) block_size 4096 (4096 B) rotational > -6> 2018-06-07 16:12:16.139188 7fba30fb4d80 1 > bluestore(/var/lib/ceph/osd/ceph-48) _set_cache_sizes cache_size > 134217728 meta 0.01 kv 0.99 data 0 > -5> 2018-06-07 16:12:16.139275 7fba30fb4d80 1 bdev create path > /var/lib/ceph/osd/ceph-48/block type kernel > -4> 2018-06-07 16:12:16.139281 7fba30fb4d80 1 bdev(0x55eb46433c00 > /var/lib/ceph/osd/ceph-48/block) open path /v > ar/lib/ceph/osd/ceph-48/block > -3> 2018-06-07 16:12:16.139454 7fba30fb4d80 1 bdev(0x55eb46433c00 > /var/lib/ceph/osd/ceph-48/block) open size 60 > 01172414464 (0x57541c00000, 5589 GB) block_size 4096 (4096 B) rotational > -2> 2018-06-07 16:12:16.139464 7fba30fb4d80 1 bluefs > add_block_device bdev 1 path /var/lib/ceph/osd/ceph-48/blo > ck size 5589 GB > -1> 2018-06-07 16:12:16.139510 7fba30fb4d80 1 bluefs mount > 0> 2018-06-07 16:12:16.142930 7fba30fb4d80 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILA > BLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/o > s/bluestore/bluefs_types.h: In function 'static void > bluefs_fnode_t::_denc_finish(ceph::buffer::ptr::iterator&, __u8 > *, __u8*, char**, uint32_t*)' thread 7fba30fb4d80 time 2018-06-07 > 16:12:16.139666 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/bluefs_types.h: > 54: FAILED assert(pos <= end) > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) > luminous (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x110) [0x55eb3b597780] > 2: (bluefs_super_t::decode(ceph::buffer::list::iterator&)+0x776) > [0x55eb3b52db36] > 3: (BlueFS::_open_super()+0xfe) [0x55eb3b50cede] > 4: (BlueFS::mount()+0xe3) [0x55eb3b5250c3] > 5: (BlueStore::_open_db(bool)+0x173d) [0x55eb3b43ebcd] > 6: (BlueStore::_mount(bool)+0x40e) [0x55eb3b47025e] > 7: (OSD::init()+0x3bd) [0x55eb3b02a1cd] > 8: (main()+0x2d07) [0x55eb3af2f977] > 9: (__libc_start_main()+0xf5) [0x7fba2d47b445] > 10: (()+0x4b7033) [0x55eb3afce033] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com