Third nautilus OSD dead in 11 days - FAILED is_valid_io(off, len)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Yesterday a third OSD died with a failed assertion, and it can no longer boot.
It's the third OSD within 11 days.

There's already a tracker issue: https://tracker.ceph.com/issues/48276


2020-12-11 20:06:51.839 7fe2b5ffd700 -1 /build/ceph-14.2.13/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7fe2b5ffd
/build/ceph-14.2.13/src/os/bluestore/KernelDevice.cc: 864: FAILED ceph_assert(is_valid_io(off, len))

 ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55ff9fac9eea]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55ff9faca0c5]
 3: (KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x19e9) [0x55ffa01272a9]
 4: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x621) [0x55ff9ffe87f1]
 5: (BlueStore::deferred_try_submit()+0x3e0) [0x55ff9ffe9da0]
 6: (BlueStore::_kv_finalize_thread()+0x6c6) [0x55ffa002fcc6]
 7: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55ffa006066d]
 8: (()+0x76db) [0x7fe2c61456db]
 9: (clone()+0x3f) [0x7fe2c4ee571f]


When the OSD tries to boot (now with 14.2.15), it fails again:

    -4> 2020-12-11 20:07:05.160 7ff2d15e7c00  1 bluestore(/var/lib/ceph/osd/ceph-147) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
    -3> 2020-12-11 20:07:05.160 7ff2d15e7c00  1 bluestore(/var/lib/ceph/osd/ceph-147) _upgrade_super from 2, latest 2
    -2> 2020-12-11 20:07:05.160 7ff2d15e7c00  1 bluestore(/var/lib/ceph/osd/ceph-147) _upgrade_super done
    -1> 2020-12-11 20:07:05.184 7ff2d15e7c00 -1 /build/ceph-14.2.15/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7ff2d15e7c00 time 2020-12-11 20:07:05.186142
/build/ceph-14.2.15/src/os/bluestore/KernelDevice.cc: 888: FAILED ceph_assert(is_valid_io(off, len))

 ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x564ad1323fba]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x564ad1324195]
 3: (KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x16da) [0x564ad198154a]
 4: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x621) [0x564ad1842911]
 5: (BlueStore::deferred_try_submit()+0x3e0) [0x564ad1843ec0]
 6: (BlueStore::_osr_drain_all()+0x1a5) [0x564ad1845255]
 7: (BlueStore::_deferred_replay()+0x1c7) [0x564ad188b297]
 8: (BlueStore::_mount(bool, bool)+0x83e) [0x564ad1892e1e]
 9: (OSD::init()+0x3f3) [0x564ad13d3db3]
 10: (main()+0x5214) [0x564ad132ccf4]
 11: (__libc_start_main()+0xe7) [0x7ff2cde60bf7]
 12: (_start()+0x2a) [0x564ad135e72a]


Anyone else with these symptoms? Any ideas how we can track it down?

-- Jonas

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux