Hi Igor, That’s correct (shown below). Would it be helpful for me to add logs/uploaded crash UUID’s to 53906 <https://tracker.ceph.com/issues/53906>, 53907 <https://tracker.ceph.com/issues/53907>, 54209 <https://tracker.ceph.com/issues/54209>, 62928 <https://tracker.ceph.com/issues/62928>, 63110 <https://tracker.ceph.com/issues/63110>, 63161 <https://tracker.ceph.com/issues/63161>, 63352 <https://tracker.ceph.com/issues/63352>? Or maybe open a new tracker to track that the parameter change isn’t being properly persisted or whatever appears to be happening? Thanks, Reed > /build/ceph-16.2.14/src/os/bluestore/BlueStore.h: 3870: FAILED ceph_assert(cur >= p.length) > > ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55d51970a987] > 2: /usr/bin/ceph-osd(+0xad3b8f) [0x55d51970ab8f] > 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x112) [0x55d519e040f2] > 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x69d) [0x55d519ea0fad] > 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xaa) [0x55d519ea14ea] > 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x7d) [0x55d519ec61ed] > 7: (BlueRocksWritableFile::Sync()+0x19) [0x55d519ed5a59] > 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x52) [0x55d51a3e37ce] > 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x216) [0x55d51a5eddac] > 10: (rocksdb::WritableFileWriter::Sync(bool)+0x17b) [0x55d51a5ed785] > 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x39a) [0x55d51a441bf8] > 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x135e) [0x55d51a43d96c] > 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x5d) [0x55d51a43c56f] > 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x85) [0x55d51a388635] > 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9b) [0x55d51a38904b] > 16: (BlueStore::_kv_sync_thread()+0x22bc) [0x55d519e016dc] > 17: (BlueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71] > 18: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609] > 19: clone() > > 0> 2024-01-10T11:39:05.922-0500 7f48f978d700 -1 *** Caught signal (Aborted) ** > in thread 7f48f978d700 thread_name:bstore_kv_sync > > ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable) > 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f490cf2f420] > 2: gsignal() > 3: abort() > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ad) [0x55d51970a9e2] > 5: /usr/bin/ceph-osd(+0xad3b8f) [0x55d51970ab8f] > 6: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x112) [0x55d519e040f2] > 7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x69d) [0x55d519ea0fad] > 8: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xaa) [0x55d519ea14ea] > 9: (BlueFS::fsync(BlueFS::FileWriter*)+0x7d) [0x55d519ec61ed] > 10: (BlueRocksWritableFile::Sync()+0x19) [0x55d519ed5a59] > 11: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x52) [0x55d51a3e37ce] > 12: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x216) [0x55d51a5eddac] > 13: (rocksdb::WritableFileWriter::Sync(bool)+0x17b) [0x55d51a5ed785] > 14: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x39a) [0x55d51a441bf8] > 15: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x135e) [0x55d51a43d96c] > 16: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x5d) [0x55d51a43c56f] > 17: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x85) [0x55d51a388635] > 18: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9b) [0x55d51a38904b] > 19: (BlueStore::_kv_sync_thread()+0x22bc) [0x55d519e016dc] > 20: (BlueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71] > 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609] > 22: clone() > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > On Jan 10, 2024, at 12:06 PM, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > > Hi Reed, > > it looks to me like your settings aren't effective. You might want to check OSD log rather than crash info and see the assertion's backtrace. > > Does it mention RocksDBBlueFSVolumeSelector as the one in https://tracker.ceph.com/issues/53906 <https://tracker.ceph.com/issues/53906>: > > ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev) > 1: /lib64/libpthread.so.0(+0x12c20) [0x7f2beb318c20] > 2: gsignal() > 3: abort() > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x56347eb33bec] > 5: /usr/bin/ceph-osd(+0x5d5daf) [0x56347eb33daf] > 6: (RocksDBBlueFSVolumeSelector::add_usage(void*, bluefs_fnode_t const&)+0) [0x56347f1f7d00] > 7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x56347f295b45] > > > If so - then there is still a mess with proper parameter changes. > > Thanks > Igor > > On 10/01/2024 20:13, Reed Dier wrote: >> Well, sadly, that setting doesn’t seem to resolve the issue. >> >> I set the value in ceph.conf for the OSDs with small WAL/DB devices that keep running into the issue, >> >>> $ ceph tell osd.12 config show | grep bluestore_volume_selection_policy >>> "bluestore_volume_selection_policy": "rocksdb_original", >>> $ ceph crash info 2024-01-10T16:39:05.925534Z_f0c57ca3-b7e6-4511-b7ae-5834541d6c67 | egrep "(assert_condition|entity_name)" >>> "assert_condition": "cur >= p.length", >>> "entity_name": "osd.12", >> >> So, I guess that configuration item doesn’t in fact prevent the crash as was purported. >> Looks like I may need to fast track moving to quincy… >> >> Reed >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx