Igor, I noticed that there's no roadmap for the next 16.2.x release. May I ask what time frame we are looking at with regards to a possible fix? We're experiencing several OSD crashes caused by this issue per day. /Z On Mon, 16 Oct 2023 at 14:19, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > That's true. > On 16/10/2023 14:13, Zakhar Kirpichenko wrote: > > Many thanks, Igor. I found previously submitted bug reports and subscribed > to them. My understanding is that the issue is going to be fixed in the > next Pacific minor release. > > /Z > > On Mon, 16 Oct 2023 at 14:03, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > >> Hi Zakhar, >> >> please see my reply for the post on the similar issue at: >> >> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/ >> >> >> Thanks, >> >> Igor >> >> On 16/10/2023 09:26, Zakhar Kirpichenko wrote: >> > Hi, >> > >> > After upgrading to Ceph 16.2.14 we had several OSD crashes >> > in bstore_kv_sync thread: >> > >> > >> > 1. "assert_thread_name": "bstore_kv_sync", >> > 2. "backtrace": [ >> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]", >> > 4. "gsignal()", >> > 5. "abort()", >> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char >> > const*)+0x1a9) [0x564dc5f87d0b]", >> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]", >> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t >> > const&)+0x15e) [0x564dc6604a9e]", >> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, >> unsigned >> > long)+0x77d) [0x564dc66951cd]", >> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90) >> > [0x564dc6695670]", >> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]", >> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]", >> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions >> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]", >> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402) >> > [0x564dc6c761c2]", >> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) >> [0x564dc6c77808]", >> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup >> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned >> > long)+0x309) [0x564dc6b780c9]", >> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, >> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, >> unsigned >> > long, bool, unsigned long*, unsigned long, >> > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]", >> > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, >> > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]", >> > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&, >> > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) >> [0x564dc6b1f644]", >> > 20. >> "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) >> > [0x564dc6b2004a]", >> > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]", >> > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]", >> > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]", >> > 24. "clone()" >> > 25. ], >> > >> > >> > I am attaching two instances of crash info for further reference: >> > https://pastebin.com/E6myaHNU >> > >> > OSD configuration is rather simple and close to default: >> > >> > osd.6 dev bluestore_cache_size_hdd 4294967296 >> > osd.6 dev >> > bluestore_cache_size_ssd 4294967296 >> > osd advanced debug_rocksdb >> > 1/5 >> osd >> > advanced osd_max_backfills 2 >> > osd basic >> > osd_memory_target 17179869184 >> > osd advanced osd_recovery_max_active >> > 2 osd >> > advanced osd_scrub_sleep 0.100000 >> > osd advanced >> > rbd_balance_parent_reads false >> > >> > debug_rocksdb is a recent change, otherwise this configuration has been >> > running without issues for months. The crashes happened on two different >> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD >> > block) don't exhibit any issues. We have not experienced such crashes >> with >> > Ceph < 16.2.14. >> > >> > Is this a known issue, or should I open a bug report? >> > >> > Best regards, >> > Zakhar >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx