Unfortunately, the OSD log from the earlier crash is not available. I have extracted the OSD log, including the recent events, from the latest crash: https://www.dropbox.com/scl/fi/1ne8h85iuc5vx78qm1t93/20231016_osd6.zip?rlkey=fxyn242q7c69ec5lkv29csx13&dl=0 I hope this helps to identify the crash reason. The log entries that I find suspicious are these right before the crash: debug -1726> 2023-10-15T22:31:21.575+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17024319488 unmapped: 4164763648 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 ... debug -1723> 2023-10-15T22:31:22.579+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17024589824 unmapped: 4164493312 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 ... debug -1718> 2023-10-15T22:31:23.579+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17027031040 unmapped: 4162052096 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 ... debug -1714> 2023-10-15T22:31:24.579+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17026301952 unmapped: 4162781184 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 debug -1713> 2023-10-15T22:31:25.383+0000 7f961ccb8700 5 bluestore.MempoolThread(0x55c5bee8cb98) _resize_shards cache_size: 13797582406 kv_alloc: 8321499136 kv_used: 8245313424 kv_onode_alloc: 4697620480 kv_onode_used: 4690617424 meta_alloc: 469762048 meta_used: 371122625 data_alloc: 134217728 data_used: 44314624 ... debug -1710> 2023-10-15T22:31:25.583+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17026367488 unmapped: 4162715648 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 ... debug -1707> 2023-10-15T22:31:26.583+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17026211840 unmapped: 4162871296 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 ... debug -1704> 2023-10-15T22:31:27.583+0000 7f961ccb8700 5 prioritycache tune_memory target: 17179869184 mapped: 17024548864 unmapped: 4164534272 heap: 21189083136 old mem: 13797582406 new mem: 13797582406 There's plenty of RAM in the system, about 120 GB free and used for cache. /Z On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > Hi, > > After upgrading to Ceph 16.2.14 we had several OSD crashes > in bstore_kv_sync thread: > > > 1. "assert_thread_name": "bstore_kv_sync", > 2. "backtrace": [ > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]", > 4. "gsignal()", > 5. "abort()", > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x1a9) [0x564dc5f87d0b]", > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]", > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t > const&)+0x15e) [0x564dc6604a9e]", > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, > unsigned long)+0x77d) [0x564dc66951cd]", > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90) > [0x564dc6695670]", > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]", > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]", > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]", > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402) > [0x564dc6c761c2]", > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]", > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned > long)+0x309) [0x564dc6b780c9]", > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned > long, bool, unsigned long*, unsigned long, > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]", > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]", > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&, > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x564dc6b1f644]", > 20. "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) > [0x564dc6b2004a]", > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]", > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]", > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]", > 24. "clone()" > 25. ], > > > I am attaching two instances of crash info for further reference: > https://pastebin.com/E6myaHNU > > OSD configuration is rather simple and close to default: > > osd.6 dev bluestore_cache_size_hdd 4294967296 > osd.6 dev > bluestore_cache_size_ssd 4294967296 > osd advanced debug_rocksdb > 1/5 osd > advanced osd_max_backfills 2 > osd basic > osd_memory_target 17179869184 > osd advanced osd_recovery_max_active > 2 osd > advanced osd_scrub_sleep 0.100000 > osd advanced > rbd_balance_parent_reads false > > debug_rocksdb is a recent change, otherwise this configuration has been > running without issues for months. The crashes happened on two different > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD > block) don't exhibit any issues. We have not experienced such crashes with > Ceph < 16.2.14. > > Is this a known issue, or should I open a bug report? > > Best regards, > Zakhar > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx