Re: Third nautilus OSD dead in 11 days - FAILED is_valid_io(off, len)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jonas,

didn't you try to switch your OSDs back to bitmap allocator as per my comment #6 in the tracker?


Also please set debug-bluestore to 20 and collect the startup log for the failing OSD - since it's repeatedly failing on exactly the same assertion this would be very helpful. That's the info I've been lacking since the initial occurrence.


Thanks,

Igor



On 12/12/2020 5:19 PM, Jonas Jelten wrote:
Hi!

Yesterday a third OSD died with a failed assertion, and it can no longer boot.
It's the third OSD within 11 days.

There's already a tracker issue: https://tracker.ceph.com/issues/48276


2020-12-11 20:06:51.839 7fe2b5ffd700 -1 /build/ceph-14.2.13/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7fe2b5ffd
/build/ceph-14.2.13/src/os/bluestore/KernelDevice.cc: 864: FAILED ceph_assert(is_valid_io(off, len))

  ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55ff9fac9eea]
  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55ff9faca0c5]
  3: (KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x19e9) [0x55ffa01272a9]
  4: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x621) [0x55ff9ffe87f1]
  5: (BlueStore::deferred_try_submit()+0x3e0) [0x55ff9ffe9da0]
  6: (BlueStore::_kv_finalize_thread()+0x6c6) [0x55ffa002fcc6]
  7: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55ffa006066d]
  8: (()+0x76db) [0x7fe2c61456db]
  9: (clone()+0x3f) [0x7fe2c4ee571f]


When the OSD tries to boot (now with 14.2.15), it fails again:

     -4> 2020-12-11 20:07:05.160 7ff2d15e7c00  1 bluestore(/var/lib/ceph/osd/ceph-147) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
     -3> 2020-12-11 20:07:05.160 7ff2d15e7c00  1 bluestore(/var/lib/ceph/osd/ceph-147) _upgrade_super from 2, latest 2
     -2> 2020-12-11 20:07:05.160 7ff2d15e7c00  1 bluestore(/var/lib/ceph/osd/ceph-147) _upgrade_super done
     -1> 2020-12-11 20:07:05.184 7ff2d15e7c00 -1 /build/ceph-14.2.15/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7ff2d15e7c00 time 2020-12-11 20:07:05.186142
/build/ceph-14.2.15/src/os/bluestore/KernelDevice.cc: 888: FAILED ceph_assert(is_valid_io(off, len))

  ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564ad1323fba]
  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564ad1324195]
  3: (KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x16da) [0x564ad198154a]
  4: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x621) [0x564ad1842911]
  5: (BlueStore::deferred_try_submit()+0x3e0) [0x564ad1843ec0]
  6: (BlueStore::_osr_drain_all()+0x1a5) [0x564ad1845255]
  7: (BlueStore::_deferred_replay()+0x1c7) [0x564ad188b297]
  8: (BlueStore::_mount(bool, bool)+0x83e) [0x564ad1892e1e]
  9: (OSD::init()+0x3f3) [0x564ad13d3db3]
  10: (main()+0x5214) [0x564ad132ccf4]
  11: (__libc_start_main()+0xe7) [0x7ff2cde60bf7]
  12: (_start()+0x2a) [0x564ad135e72a]


Anyone else with these symptoms? Any ideas how we can track it down?

-- Jonas


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux