Thank you, Igor. I was just reading the detailed list of changes for 16.2.14, as I suspected that we might not be able to go back to the previous minor release :-) Thanks again for the suggestions, we'll consider our options. /Z On Fri, 20 Oct 2023 at 16:08, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > Zakhar, > > my general concern about downgrading to previous versions is that this > procedure is generally neither assumed nor tested by dev team. Although is > possible most of the time. But in this specific case it is not doable due > to (at least) https://github.com/ceph/ceph/pull/52212 which enables 4K > bluefs allocation unit support - once some daemon gets it - there is no way > back. > > I'm still thinking that setting "fit_to_fast" mode without enabling > dynamic compaction levels is quite safe but definitely it's better to be > tested in the real environment and under actual payload first. Also you > might want to apply such a workaround gradually - one daemon first, bake it > for a while, then apply for the full node, bake a bit more and finally go > forward and update the remaining. Or even better - bake it in a test > cluster first. > > Alternatively you might consider building updated code yourself and make > patched binaries on top of .14... > > > Thanks, > > Igor > > > On 20/10/2023 15:10, Zakhar Kirpichenko wrote: > > Thank you, Igor. > > It is somewhat disappointing that fixing this bug in Pacific has such a > low priority, considering its impact on existing clusters. > > The document attached to the PR explicitly says about > `level_compaction_dynamic_level_bytes` that "enabling it on an existing DB > requires special caution", we'd rather not experiment with something that > has the potential to cause data corruption or loss in a production cluster. > Perhaps a downgrade to the previous version, 16.2.13 which worked for us > without any issues, is an option, or would you advise against such a > downgrade from 16.2.14? > > /Z > > On Fri, 20 Oct 2023 at 14:46, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > >> Hi Zakhar, >> >> Definitely we expect one more (and apparently the last) Pacific minor >> release. There is no specific date yet though - the plans are to release >> Quincy and Reef minor releases prior to it. Hopefully to be done before the >> Christmas/New Year. >> >> Meanwhile you might want to workaround the issue by tuning >> bluestore_volume_selection_policy. Unfortunately most likely my original >> proposal to set it to rocksdb_original wouldn't work in this case so you >> better try "fit_to_fast" mode. This should be coupled with enabling >> 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty >> good spec on applying this mode to BlueStore attached to >> https://github.com/ceph/ceph/pull/37156. >> >> >> Thanks, >> >> Igor >> On 20/10/2023 06:03, Zakhar Kirpichenko wrote: >> >> Igor, I noticed that there's no roadmap for the next 16.2.x release. May >> I ask what time frame we are looking at with regards to a possible fix? >> >> We're experiencing several OSD crashes caused by this issue per day. >> >> /Z >> >> On Mon, 16 Oct 2023 at 14:19, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: >> >>> That's true. >>> On 16/10/2023 14:13, Zakhar Kirpichenko wrote: >>> >>> Many thanks, Igor. I found previously submitted bug reports and >>> subscribed to them. My understanding is that the issue is going to be fixed >>> in the next Pacific minor release. >>> >>> /Z >>> >>> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov <igor.fedotov@xxxxxxxx> >>> wrote: >>> >>>> Hi Zakhar, >>>> >>>> please see my reply for the post on the similar issue at: >>>> >>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/ >>>> >>>> >>>> Thanks, >>>> >>>> Igor >>>> >>>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote: >>>> > Hi, >>>> > >>>> > After upgrading to Ceph 16.2.14 we had several OSD crashes >>>> > in bstore_kv_sync thread: >>>> > >>>> > >>>> > 1. "assert_thread_name": "bstore_kv_sync", >>>> > 2. "backtrace": [ >>>> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]", >>>> > 4. "gsignal()", >>>> > 5. "abort()", >>>> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> > const*)+0x1a9) [0x564dc5f87d0b]", >>>> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]", >>>> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t >>>> > const&)+0x15e) [0x564dc6604a9e]", >>>> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, >>>> unsigned >>>> > long)+0x77d) [0x564dc66951cd]", >>>> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90) >>>> > [0x564dc6695670]", >>>> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]", >>>> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]", >>>> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions >>>> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]", >>>> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402) >>>> > [0x564dc6c761c2]", >>>> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) >>>> [0x564dc6c77808]", >>>> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup >>>> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, >>>> unsigned >>>> > long)+0x309) [0x564dc6b780c9]", >>>> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, >>>> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, >>>> unsigned >>>> > long, bool, unsigned long*, unsigned long, >>>> > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]", >>>> > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, >>>> > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]", >>>> > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&, >>>> > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) >>>> [0x564dc6b1f644]", >>>> > 20. >>>> "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) >>>> > [0x564dc6b2004a]", >>>> > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]", >>>> > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]", >>>> > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]", >>>> > 24. "clone()" >>>> > 25. ], >>>> > >>>> > >>>> > I am attaching two instances of crash info for further reference: >>>> > https://pastebin.com/E6myaHNU >>>> > >>>> > OSD configuration is rather simple and close to default: >>>> > >>>> > osd.6 dev bluestore_cache_size_hdd 4294967296 >>>> > osd.6 dev >>>> > bluestore_cache_size_ssd 4294967296 >>>> > osd advanced debug_rocksdb >>>> > 1/5 >>>> osd >>>> > advanced osd_max_backfills 2 >>>> > osd basic >>>> > osd_memory_target 17179869184 >>>> > osd advanced osd_recovery_max_active >>>> > 2 osd >>>> > advanced osd_scrub_sleep 0.100000 >>>> > osd advanced >>>> > rbd_balance_parent_reads false >>>> > >>>> > debug_rocksdb is a recent change, otherwise this configuration has >>>> been >>>> > running without issues for months. The crashes happened on two >>>> different >>>> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD >>>> > block) don't exhibit any issues. We have not experienced such crashes >>>> with >>>> > Ceph < 16.2.14. >>>> > >>>> > Is this a known issue, or should I open a bug report? >>>> > >>>> > Best regards, >>>> > Zakhar >>>> > _______________________________________________ >>>> > ceph-users mailing list -- ceph-users@xxxxxxx >>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx