Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Fri, 20 Oct 2023 16:17:54 +0300

Thank you, Igor. I was just reading the detailed list of changes for
16.2.14, as I suspected that we might not be able to go back to the
previous minor release :-) Thanks again for the suggestions, we'll consider
our options.

/Z

On Fri, 20 Oct 2023 at 16:08, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Zakhar,
>
> my general concern about downgrading to previous versions is that this
> procedure is generally neither assumed nor tested by dev team. Although is
> possible most of the time. But in this specific case it is not doable due
> to (at least) https://github.com/ceph/ceph/pull/52212 which enables 4K
> bluefs allocation unit support - once some daemon gets it - there is no way
> back.
>
> I'm still thinking that setting "fit_to_fast" mode without enabling
> dynamic compaction levels is quite safe but definitely it's better to be
> tested in the real environment and under actual payload first. Also you
> might want to apply such a workaround gradually - one daemon first, bake it
> for a while, then apply for the full node, bake a bit more and finally go
> forward and update the remaining. Or even better - bake it in a test
> cluster first.
>
> Alternatively you might consider building updated code yourself and make
> patched binaries on top of .14...
>
>
> Thanks,
>
> Igor
>
>
> On 20/10/2023 15:10, Zakhar Kirpichenko wrote:
>
> Thank you, Igor.
>
> It is somewhat disappointing that fixing this bug in Pacific has such a
> low priority, considering its impact on existing clusters.
>
> The document attached to the PR explicitly says about
> `level_compaction_dynamic_level_bytes` that "enabling it on an existing DB
> requires special caution", we'd rather not experiment with something that
> has the potential to cause data corruption or loss in a production cluster.
> Perhaps a downgrade to the previous version, 16.2.13 which worked for us
> without any issues, is an option, or would you advise against such a
> downgrade from 16.2.14?
>
> /Z
>
> On Fri, 20 Oct 2023 at 14:46, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
>
>> Hi Zakhar,
>>
>> Definitely we expect one more (and apparently the last) Pacific minor
>> release. There is no specific date yet though - the plans are to release
>> Quincy and Reef minor releases prior to it. Hopefully to be done before the
>> Christmas/New Year.
>>
>> Meanwhile you might want to workaround the issue by tuning
>> bluestore_volume_selection_policy. Unfortunately most likely my original
>> proposal to set it to rocksdb_original wouldn't work in this case so you
>> better try "fit_to_fast" mode. This should be coupled with enabling
>> 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
>> good spec on applying this mode to BlueStore attached to
>> https://github.com/ceph/ceph/pull/37156.
>>
>>
>> Thanks,
>>
>> Igor
>> On 20/10/2023 06:03, Zakhar Kirpichenko wrote:
>>
>> Igor, I noticed that there's no roadmap for the next 16.2.x release. May
>> I ask what time frame we are looking at with regards to a possible fix?
>>
>> We're experiencing several OSD crashes caused by this issue per day.
>>
>> /Z
>>
>> On Mon, 16 Oct 2023 at 14:19, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
>>
>>> That's true.
>>> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
>>>
>>> Many thanks, Igor. I found previously submitted bug reports and
>>> subscribed to them. My understanding is that the issue is going to be fixed
>>> in the next Pacific minor release.
>>>
>>> /Z
>>>
>>> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov <igor.fedotov@xxxxxxxx>
>>> wrote:
>>>
>>>> Hi Zakhar,
>>>>
>>>> please see my reply for the post on the similar issue at:
>>>>
>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
>>>> > Hi,
>>>> >
>>>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
>>>> > in bstore_kv_sync thread:
>>>> >
>>>> >
>>>> >     1. "assert_thread_name": "bstore_kv_sync",
>>>> >     2. "backtrace": [
>>>> >     3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>>>> >     4. "gsignal()",
>>>> >     5. "abort()",
>>>> >     6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> >     const*)+0x1a9) [0x564dc5f87d0b]",
>>>> >     7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>>>> >     8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>>>> >     const&)+0x15e) [0x564dc6604a9e]",
>>>> >     9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>>>> unsigned
>>>> >     long)+0x77d) [0x564dc66951cd]",
>>>> >     10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>>>> >     [0x564dc6695670]",
>>>> >     11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>>>> >     12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>>>> >     13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>>>> >     const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>>>> >     14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>>>> >     [0x564dc6c761c2]",
>>>> >     15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
>>>> [0x564dc6c77808]",
>>>> >     16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>>>> >     const&, rocksdb::log::Writer*, unsigned long*, bool, bool,
>>>> unsigned
>>>> >     long)+0x309) [0x564dc6b780c9]",
>>>> >     17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>>>> >     rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
>>>> unsigned
>>>> >     long, bool, unsigned long*, unsigned long,
>>>> >     rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>>>> >     18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>>>> >     rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>>>> >     19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>>>> >     std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84)
>>>> [0x564dc6b1f644]",
>>>> >     20.
>>>> "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a)
>>>> >     [0x564dc6b2004a]",
>>>> >     21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>>>> >     22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>>>> >     23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>>>> >     24. "clone()"
>>>> >     25. ],
>>>> >
>>>> >
>>>> > I am attaching two instances of crash info for further reference:
>>>> > https://pastebin.com/E6myaHNU
>>>> >
>>>> > OSD configuration is rather simple and close to default:
>>>> >
>>>> > osd.6         dev       bluestore_cache_size_hdd            4294967296
>>>> >                                            osd.6         dev
>>>> > bluestore_cache_size_ssd            4294967296
>>>> >                    osd           advanced  debug_rocksdb
>>>> >    1/5
>>>>  osd
>>>> >          advanced  osd_max_backfills                   2
>>>> >                                                  osd           basic
>>>> > osd_memory_target                   17179869184
>>>> >                      osd           advanced  osd_recovery_max_active
>>>> >      2                                                             osd
>>>> >      advanced  osd_scrub_sleep                     0.100000
>>>> >                                        osd           advanced
>>>> >   rbd_balance_parent_reads            false
>>>> >
>>>> > debug_rocksdb is a recent change, otherwise this configuration has
>>>> been
>>>> > running without issues for months. The crashes happened on two
>>>> different
>>>> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
>>>> > block) don't exhibit any issues. We have not experienced such crashes
>>>> with
>>>> > Ceph < 16.2.14.
>>>> >
>>>> > Is this a known issue, or should I open a bug report?
>>>> >
>>>> > Best regards,
>>>> > Zakhar
>>>> > _______________________________________________
>>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx