Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Fri, 20 Oct 2023 08:39:10 -0400

On Fri, Oct 20, 2023, 8:11 AM Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:

> Thank you, Igor.
>
> It is somewhat disappointing that fixing this bug in Pacific has such a low
> priority, considering its impact on existing clusters.
>

Unfortunately, the hard truth here is that Pacific (stable) was released
over 30 months ago. It has had a good run for a freely distributed product,
and there's only so much time you can dedicate to backporting bugfixes --
it claws time away from other forward-thinking initiatives.

Speaking from someone who's been at the helm of production clusters, I know
Ceph upgrades can be an experience and it's frustrating to hear, but you
have to jump sometime...

Regards,
Tyler

> On Fri, 20 Oct 2023 at 14:46, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
>
> > Hi Zakhar,
> >
> > Definitely we expect one more (and apparently the last) Pacific minor
> > release. There is no specific date yet though - the plans are to release
> > Quincy and Reef minor releases prior to it. Hopefully to be done before
> the
> > Christmas/New Year.
> >
> > Meanwhile you might want to workaround the issue by tuning
> > bluestore_volume_selection_policy. Unfortunately most likely my original
> > proposal to set it to rocksdb_original wouldn't work in this case so you
> > better try "fit_to_fast" mode. This should be coupled with enabling
> > 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
> > good spec on applying this mode to BlueStore attached to
> > https://github.com/ceph/ceph/pull/37156.
> >
> >
> > Thanks,
> >
> > Igor
> > On 20/10/2023 06:03, Zakhar Kirpichenko wrote:
> >
> > Igor, I noticed that there's no roadmap for the next 16.2.x release. May
> I
> > ask what time frame we are looking at with regards to a possible fix?
> >
> > We're experiencing several OSD crashes caused by this issue per day.
> >
> > /Z
> >
> > On Mon, 16 Oct 2023 at 14:19, Igor Fedotov <igor.fedotov@xxxxxxxx>
> wrote:
> >
> >> That's true.
> >> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
> >>
> >> Many thanks, Igor. I found previously submitted bug reports and
> >> subscribed to them. My understanding is that the issue is going to be
> fixed
> >> in the next Pacific minor release.
> >>
> >> /Z
> >>
> >> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov <igor.fedotov@xxxxxxxx>
> wrote:
> >>
> >>> Hi Zakhar,
> >>>
> >>> please see my reply for the post on the similar issue at:
> >>>
> >>>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Igor
> >>>
> >>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
> >>> > Hi,
> >>> >
> >>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
> >>> > in bstore_kv_sync thread:
> >>> >
> >>> >
> >>> >     1. "assert_thread_name": "bstore_kv_sync",
> >>> >     2. "backtrace": [
> >>> >     3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
> >>> >     4. "gsignal()",
> >>> >     5. "abort()",
> >>> >     6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> >     const*)+0x1a9) [0x564dc5f87d0b]",
> >>> >     7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
> >>> >     8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
> >>> >     const&)+0x15e) [0x564dc6604a9e]",
> >>> >     9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
> >>> unsigned
> >>> >     long)+0x77d) [0x564dc66951cd]",
> >>> >     10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
> >>> >     [0x564dc6695670]",
> >>> >     11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b)
> [0x564dc66b1a6b]",
> >>> >     12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
> >>> >     13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
> >>> >     const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
> >>> >     14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
> >>> >     [0x564dc6c761c2]",
> >>> >     15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
> >>> [0x564dc6c77808]",
> >>> >     16.
> "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
> >>> >     const&, rocksdb::log::Writer*, unsigned long*, bool, bool,
> unsigned
> >>> >     long)+0x309) [0x564dc6b780c9]",
> >>> >     17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
> >>> >     rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
> >>> unsigned
> >>> >     long, bool, unsigned long*, unsigned long,
> >>> >     rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
> >>> >     18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
> >>> >     rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
> >>> >     19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
> >>> >     std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84)
> >>> [0x564dc6b1f644]",
> >>> >     20.
> >>>
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a)
> >>> >     [0x564dc6b2004a]",
> >>> >     21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
> >>> >     22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
> >>> >     23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
> >>> >     24. "clone()"
> >>> >     25. ],
> >>> >
> >>> >
> >>> > I am attaching two instances of crash info for further reference:
> >>> > https://pastebin.com/E6myaHNU
> >>> >
> >>> > OSD configuration is rather simple and close to default:
> >>> >
> >>> > osd.6         dev       bluestore_cache_size_hdd
> 4294967296
> >>> >                                            osd.6         dev
> >>> > bluestore_cache_size_ssd            4294967296
> >>> >                    osd           advanced  debug_rocksdb
> >>> >    1/5
> >>>  osd
> >>> >          advanced  osd_max_backfills                   2
> >>> >                                                  osd           basic
> >>> > osd_memory_target                   17179869184
> >>> >                      osd           advanced  osd_recovery_max_active
> >>> >      2
>  osd
> >>> >      advanced  osd_scrub_sleep                     0.100000
> >>> >                                        osd           advanced
> >>> >   rbd_balance_parent_reads            false
> >>> >
> >>> > debug_rocksdb is a recent change, otherwise this configuration has
> been
> >>> > running without issues for months. The crashes happened on two
> >>> different
> >>> > hosts with identical hardware, the hosts and storage (NVME DB/WAL,
> HDD
> >>> > block) don't exhibit any issues. We have not experienced such crashes
> >>> with
> >>> > Ceph < 16.2.14.
> >>> >
> >>> > Is this a known issue, or should I open a bug report?
> >>> >
> >>> > Best regards,
> >>> > Zakhar
> >>> > _______________________________________________
> >>> > ceph-users mailing list -- ceph-users@xxxxxxx
> >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx