Hi Jonas,
didn't you try to switch your OSDs back to bitmap allocator as per my
comment #6 in the tracker?
Also please set debug-bluestore to 20 and collect the startup log for
the failing OSD - since it's repeatedly failing on exactly the same
assertion this would be very helpful. That's the info I've been lacking
since the initial occurrence.
Thanks,
Igor
On 12/12/2020 5:19 PM, Jonas Jelten wrote:
Hi!
Yesterday a third OSD died with a failed assertion, and it can no longer boot.
It's the third OSD within 11 days.
There's already a tracker issue: https://tracker.ceph.com/issues/48276
2020-12-11 20:06:51.839 7fe2b5ffd700 -1 /build/ceph-14.2.13/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7fe2b5ffd
/build/ceph-14.2.13/src/os/bluestore/KernelDevice.cc: 864: FAILED ceph_assert(is_valid_io(off, len))
ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55ff9fac9eea]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55ff9faca0c5]
3: (KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x19e9) [0x55ffa01272a9]
4: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x621) [0x55ff9ffe87f1]
5: (BlueStore::deferred_try_submit()+0x3e0) [0x55ff9ffe9da0]
6: (BlueStore::_kv_finalize_thread()+0x6c6) [0x55ffa002fcc6]
7: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55ffa006066d]
8: (()+0x76db) [0x7fe2c61456db]
9: (clone()+0x3f) [0x7fe2c4ee571f]
When the OSD tries to boot (now with 14.2.15), it fails again:
-4> 2020-12-11 20:07:05.160 7ff2d15e7c00 1 bluestore(/var/lib/ceph/osd/ceph-147) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
-3> 2020-12-11 20:07:05.160 7ff2d15e7c00 1 bluestore(/var/lib/ceph/osd/ceph-147) _upgrade_super from 2, latest 2
-2> 2020-12-11 20:07:05.160 7ff2d15e7c00 1 bluestore(/var/lib/ceph/osd/ceph-147) _upgrade_super done
-1> 2020-12-11 20:07:05.184 7ff2d15e7c00 -1 /build/ceph-14.2.15/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7ff2d15e7c00 time 2020-12-11 20:07:05.186142
/build/ceph-14.2.15/src/os/bluestore/KernelDevice.cc: 888: FAILED ceph_assert(is_valid_io(off, len))
ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564ad1323fba]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564ad1324195]
3: (KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x16da) [0x564ad198154a]
4: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x621) [0x564ad1842911]
5: (BlueStore::deferred_try_submit()+0x3e0) [0x564ad1843ec0]
6: (BlueStore::_osr_drain_all()+0x1a5) [0x564ad1845255]
7: (BlueStore::_deferred_replay()+0x1c7) [0x564ad188b297]
8: (BlueStore::_mount(bool, bool)+0x83e) [0x564ad1892e1e]
9: (OSD::init()+0x3f3) [0x564ad13d3db3]
10: (main()+0x5214) [0x564ad132ccf4]
11: (__libc_start_main()+0xe7) [0x7ff2cde60bf7]
12: (_start()+0x2a) [0x564ad135e72a]
Anyone else with these symptoms? Any ideas how we can track it down?
-- Jonas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx