Re: ceph-volume: failed to activate some bluestore osds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 7, 2018 at 10:23 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi all,
>
> We have an intermittent issue where bluestore osds sometimes fail to
> start after a reboot.
> The osds all fail the same way [see 2], failing to open the superblock.
> One one particular host, there are 24 osds and 4 SSDs partitioned for
> the block.db's. The affected non-starting OSDs all have block.db on
> the same ssd (/dev/sdaa).
>
> The osds are all running 12.2.5 on latest centos 7.5 and were created
> by ceph-volume lvm, e.g. see [1].
>
> This seems like a permissions or similar issue related to the
> ceph-volume tooling.
> Any clues how to debug this further?

There are useful logs in both /var/log/ceph/ceph-volume.log and
/var/log/ceph-volume-systemd.log

This is odd because the OSD is attempting to start, so the logs will
just mention everything was done accordingly and then the OSD was
started at the end of the (successful) setup.


>
> Thanks!
>
> Dan
>
> [1]
>
> ====== osd.48 ======
>
>   [block]    /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5
>
>       type                      block
>       osd id                    48
>       cluster fsid              dd535a7e-4647-4bee-853d-f34112615f81
>       cluster name              ceph
>       osd fsid                  99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5
>       db device                 /dev/sdaa1
>       encrypted                 0
>       db uuid                   3381a121-1c1b-4e45-a986-c1871c363edc
>       cephx lockbox secret
>       block uuid                FQkRxS-No7X-ajkP-5L3N-K22a-IXg6-QLceZC
>       block device
> /dev/ceph-34f24306-d90c-49ff-bafb-2657a6a18010/osd-block-99fd8e36-fc4d-4bbc-83d9-f5e611cde4b5
>       crush device class        None
>
>   [  db]    /dev/sdaa1
>
>       PARTUUID                  3381a121-1c1b-4e45-a986-c1871c363edc
>
>
>
> [2]
>    -11> 2018-06-07 16:12:16.138407 7fba30fb4d80  1 -- - start start
>    -10> 2018-06-07 16:12:16.138516 7fba30fb4d80  1
> bluestore(/var/lib/ceph/osd/ceph-48) _mount path /var/lib/ceph/os
> d/ceph-48
>     -9> 2018-06-07 16:12:16.138801 7fba30fb4d80  1 bdev create path
> /var/lib/ceph/osd/ceph-48/block type kernel
>     -8> 2018-06-07 16:12:16.138808 7fba30fb4d80  1 bdev(0x55eb46433a00
> /var/lib/ceph/osd/ceph-48/block) open path /v
> ar/lib/ceph/osd/ceph-48/block
>     -7> 2018-06-07 16:12:16.138999 7fba30fb4d80  1 bdev(0x55eb46433a00
> /var/lib/ceph/osd/ceph-48/block) open size 60
> 01172414464 (0x57541c00000, 5589 GB) block_size 4096 (4096 B) rotational
>     -6> 2018-06-07 16:12:16.139188 7fba30fb4d80  1
> bluestore(/var/lib/ceph/osd/ceph-48) _set_cache_sizes cache_size
> 134217728 meta 0.01 kv 0.99 data 0
>     -5> 2018-06-07 16:12:16.139275 7fba30fb4d80  1 bdev create path
> /var/lib/ceph/osd/ceph-48/block type kernel
>     -4> 2018-06-07 16:12:16.139281 7fba30fb4d80  1 bdev(0x55eb46433c00
> /var/lib/ceph/osd/ceph-48/block) open path /v
> ar/lib/ceph/osd/ceph-48/block
>     -3> 2018-06-07 16:12:16.139454 7fba30fb4d80  1 bdev(0x55eb46433c00
> /var/lib/ceph/osd/ceph-48/block) open size 60
> 01172414464 (0x57541c00000, 5589 GB) block_size 4096 (4096 B) rotational
>     -2> 2018-06-07 16:12:16.139464 7fba30fb4d80  1 bluefs
> add_block_device bdev 1 path /var/lib/ceph/osd/ceph-48/blo
> ck size 5589 GB
>     -1> 2018-06-07 16:12:16.139510 7fba30fb4d80  1 bluefs mount
>      0> 2018-06-07 16:12:16.142930 7fba30fb4d80 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILA
> BLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/o
> s/bluestore/bluefs_types.h: In function 'static void
> bluefs_fnode_t::_denc_finish(ceph::buffer::ptr::iterator&, __u8
> *, __u8*, char**, uint32_t*)' thread 7fba30fb4d80 time 2018-06-07
> 16:12:16.139666
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/bluefs_types.h:
> 54: FAILED assert(pos <= end)
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a)
> luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x110) [0x55eb3b597780]
>  2: (bluefs_super_t::decode(ceph::buffer::list::iterator&)+0x776)
> [0x55eb3b52db36]
>  3: (BlueFS::_open_super()+0xfe) [0x55eb3b50cede]
>  4: (BlueFS::mount()+0xe3) [0x55eb3b5250c3]
>  5: (BlueStore::_open_db(bool)+0x173d) [0x55eb3b43ebcd]
>  6: (BlueStore::_mount(bool)+0x40e) [0x55eb3b47025e]
>  7: (OSD::init()+0x3bd) [0x55eb3b02a1cd]
>  8: (main()+0x2d07) [0x55eb3af2f977]
>  9: (__libc_start_main()+0xf5) [0x7fba2d47b445]
>  10: (()+0x4b7033) [0x55eb3afce033]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux