On Thu, Jun 7, 2018 at 10:54 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > On Thu, Jun 7, 2018 at 4:41 PM Sage Weil <sweil@xxxxxxxxxx> wrote: >> >> On Thu, 7 Jun 2018, Dan van der Ster wrote: >> > On Thu, Jun 7, 2018 at 4:33 PM Sage Weil <sweil@xxxxxxxxxx> wrote: >> > > >> > > On Thu, 7 Jun 2018, Dan van der Ster wrote: >> > > > Hi all, >> > > > >> > > > We have an intermittent issue where bluestore osds sometimes fail to >> > > > start after a reboot. >> > > > The osds all fail the same way [see 2], failing to open the superblock. >> > > > One one particular host, there are 24 osds and 4 SSDs partitioned for >> > > > the block.db's. The affected non-starting OSDs all have block.db on >> > > > the same ssd (/dev/sdaa). >> > > > >> > > > The osds are all running 12.2.5 on latest centos 7.5 and were created >> > > > by ceph-volume lvm, e.g. see [1]. >> > > > >> > > > This seems like a permissions or similar issue related to the >> > > > ceph-volume tooling. >> > > > Any clues how to debug this further? >> > > >> > > I take it the OSDs start up if you try again? >> > >> > Hey. >> > No, they don't. For example, we do this `ceph-volume lvm activate 48 >> > 99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5` several times and its the same >> > mount failure every time. >> >> That sounds like a bluefs bug then, not a ceph-volume issue. Can you >> try to start the OSD will logging enabled? (debug bluefs = 20, >> debug bluestore = 20) >> > > Here: https://pastebin.com/TJXZhfcY > > Is it supposed to print something about the block.db at some point???? This has to be some logging mistake because it is block.db, never just 'block' : bdev(0x5653ffdadc00 /var/lib/ceph/osd/ceph-48/block) open path /var/lib/ceph/osd/ceph-48/block That is what you are referring to here right? Now, re-reading the thread, you say that it sometimes does boot normally? ceph-volume tries (in different ways) to ensure that the devices used are the correct ones. In the case of /dev/sdaa1 it has persisted the partuuid (3381a121-1c1b-4e45-a986-c1871c363edc) which is later queried using blkid to find the right device name (/dev/sdaa1 in your case). Is it possible that you are seeing somewhere where ceph-volume is *not* matching this correctly? If osd.48 comes up online, how does the /var/lib/osd/ceph-48 looks? the same? > > Here's the osd dir: > > # ls -l /var/lib/ceph/osd/ceph-48/ > total 24 > lrwxrwxrwx. 1 ceph ceph 93 Jun 7 16:46 block -> > /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > lrwxrwxrwx. 1 root root 10 Jun 7 16:46 block.db -> /dev/sdaa1 > -rw-------. 1 ceph ceph 37 Jun 7 16:46 ceph_fsid > -rw-------. 1 ceph ceph 37 Jun 7 16:46 fsid > -rw-------. 1 ceph ceph 56 Jun 7 16:46 keyring > -rw-------. 1 ceph ceph 6 Jun 7 16:46 ready > -rw-------. 1 ceph ceph 10 Jun 7 16:46 type > -rw-------. 1 ceph ceph 3 Jun 7 16:46 whoami > > # ls -l /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > lrwxrwxrwx. 1 root root 7 Jun 7 16:46 > /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > -> ../dm-4 > > # ls -l /dev/dm-4 > brw-rw----. 1 ceph ceph 253, 4 Jun 7 16:46 /dev/dm-4 > > > --- Logical volume --- > LV Path > /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > LV Name osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 > VG Name ceph-34f24306-d90c-49ff-bafb-2657a6a18010 > LV UUID FQkRxS-No7X-ajkP-5L3N-K22a-IXg6-QLceZC > LV Write Access read/write > LV Creation host, time p06253939y61826.cern.ch, 2018-03-15 10:57:37 +0100 > LV Status available > # open 0 > LV Size <5.46 TiB > Current LE 1430791 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:4 > > --- Physical volume --- > PV Name /dev/sda > VG Name ceph-34f24306-d90c-49ff-bafb-2657a6a18010 > PV Size <5.46 TiB / not usable <2.59 MiB > Allocatable yes (but full) > PE Size 4.00 MiB > Total PE 1430791 > Free PE 0 > Allocated PE 1430791 > PV UUID WP0Z7C-ejSh-fpSa-a73N-H2Hz-yC78-qBezcI > > (sorry for wall o' lvm) > > -- dan > > > > > > > > >> Thanks! >> sage >> >> >> > -- dan >> > >> > >> > > >> > > sage >> > > >> > > >> > > > >> > > > Thanks! >> > > > >> > > > Dan >> > > > >> > > > [1] >> > > > >> > > > ====== osd.48 ====== >> > > > >> > > > [block] /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 >> > > > >> > > > type block >> > > > osd id 48 >> > > > cluster fsid dd535a7e-4647-4bee-853d-f34112615f81 >> > > > cluster name ceph >> > > > osd fsid 99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 >> > > > db device /dev/sdaa1 >> > > > encrypted 0 >> > > > db uuid 3381a121-1c1b-4e45-a986-c1871c363edc >> > > > cephx lockbox secret >> > > > block uuid FQkRxS-No7X-ajkP-5L3N-K22a-IXg6-QLceZC >> > > > block device >> > > > /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5 >> > > > crush device class None >> > > > >> > > > [ db] /dev/sdaa1 >> > > > >> > > > PARTUUID 3381a121-1c1b-4e45-a986-c1871c363edc >> > > > >> > > > >> > > > >> > > > [2] >> > > > -11> 2018-06-07 16:12:16.138407 7fba30fb4d80 1 -- - start start >> > > > -10> 2018-06-07 16:12:16.138516 7fba30fb4d80 1 >> > > > bluestore(/var/lib/ceph/osd/ceph-48) _mount path /var/lib/ceph/os >> > > > d/ceph-48 >> > > > -9> 2018-06-07 16:12:16.138801 7fba30fb4d80 1 bdev create path >> > > > /var/lib/ceph/osd/ceph-48/block type kernel >> > > > -8> 2018-06-07 16:12:16.138808 7fba30fb4d80 1 bdev(0x55eb46433a00 >> > > > /var/lib/ceph/osd/ceph-48/block) open path /v >> > > > ar/lib/ceph/osd/ceph-48/block >> > > > -7> 2018-06-07 16:12:16.138999 7fba30fb4d80 1 bdev(0x55eb46433a00 >> > > > /var/lib/ceph/osd/ceph-48/block) open size 60 >> > > > 01172414464 (0x57541c00000, 5589 GB) block_size 4096 (4096 B) rotational >> > > > -6> 2018-06-07 16:12:16.139188 7fba30fb4d80 1 >> > > > bluestore(/var/lib/ceph/osd/ceph-48) _set_cache_sizes cache_size >> > > > 134217728 meta 0.01 kv 0.99 data 0 >> > > > -5> 2018-06-07 16:12:16.139275 7fba30fb4d80 1 bdev create path >> > > > /var/lib/ceph/osd/ceph-48/block type kernel >> > > > -4> 2018-06-07 16:12:16.139281 7fba30fb4d80 1 bdev(0x55eb46433c00 >> > > > /var/lib/ceph/osd/ceph-48/block) open path /v >> > > > ar/lib/ceph/osd/ceph-48/block >> > > > -3> 2018-06-07 16:12:16.139454 7fba30fb4d80 1 bdev(0x55eb46433c00 >> > > > /var/lib/ceph/osd/ceph-48/block) open size 60 >> > > > 01172414464 (0x57541c00000, 5589 GB) block_size 4096 (4096 B) rotational >> > > > -2> 2018-06-07 16:12:16.139464 7fba30fb4d80 1 bluefs >> > > > add_block_device bdev 1 path /var/lib/ceph/osd/ceph-48/blo >> > > > ck size 5589 GB >> > > > -1> 2018-06-07 16:12:16.139510 7fba30fb4d80 1 bluefs mount >> > > > 0> 2018-06-07 16:12:16.142930 7fba30fb4d80 -1 >> > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILA >> > > > BLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/o >> > > > s/bluestore/bluefs_types.h: In function 'static void >> > > > bluefs_fnode_t::_denc_finish(ceph::buffer::ptr::iterator&, __u8 >> > > > *, __u8*, char**, uint32_t*)' thread 7fba30fb4d80 time 2018-06-07 >> > > > 16:12:16.139666 >> > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/bluefs_types.h: >> > > > 54: FAILED assert(pos <= end) >> > > > >> > > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) >> > > > luminous (stable) >> > > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> > > > const*)+0x110) [0x55eb3b597780] >> > > > 2: (bluefs_super_t::decode(ceph::buffer::list::iterator&)+0x776) >> > > > [0x55eb3b52db36] >> > > > 3: (BlueFS::_open_super()+0xfe) [0x55eb3b50cede] >> > > > 4: (BlueFS::mount()+0xe3) [0x55eb3b5250c3] >> > > > 5: (BlueStore::_open_db(bool)+0x173d) [0x55eb3b43ebcd] >> > > > 6: (BlueStore::_mount(bool)+0x40e) [0x55eb3b47025e] >> > > > 7: (OSD::init()+0x3bd) [0x55eb3b02a1cd] >> > > > 8: (main()+0x2d07) [0x55eb3af2f977] >> > > > 9: (__libc_start_main()+0xf5) [0x7fba2d47b445] >> > > > 10: (()+0x4b7033) [0x55eb3afce033] >> > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> > > > needed to interpret this. >> > > > _______________________________________________ >> > > > ceph-users mailing list >> > > > ceph-users@xxxxxxxxxxxxxx >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > >> > > > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com