Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That's true.

On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
Many thanks, Igor. I found previously submitted bug reports and subscribed to them. My understanding is that the issue is going to be fixed in the next Pacific minor release.

/Z

On Mon, 16 Oct 2023 at 14:03, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

    Hi Zakhar,

    please see my reply for the post on the similar issue at:
    https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/


    Thanks,

    Igor

    On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
    > Hi,
    >
    > After upgrading to Ceph 16.2.14 we had several OSD crashes
    > in bstore_kv_sync thread:
    >
    >
    >     1. "assert_thread_name": "bstore_kv_sync",
    >     2. "backtrace": [
    >     3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
    >     4. "gsignal()",
    >     5. "abort()",
    >     6. "(ceph::__ceph_assert_fail(char const*, char const*, int,
    char
    >     const*)+0x1a9) [0x564dc5f87d0b]",
    >     7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
    >     8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*,
    bluefs_fnode_t
    >     const&)+0x15e) [0x564dc6604a9e]",
    >     9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned
    long, unsigned
    >     long)+0x77d) [0x564dc66951cd]",
    >     10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
    >     [0x564dc6695670]",
    >     11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b)
    [0x564dc66b1a6b]",
    >     12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
    >     13.
    "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
    >     const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
    >     14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
    >     [0x564dc6c761c2]",
    >     15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
    [0x564dc6c77808]",
    >     16.
    "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
    >     const&, rocksdb::log::Writer*, unsigned long*, bool, bool,
    unsigned
    >     long)+0x309) [0x564dc6b780c9]",
    >     17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
    >     rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned
    long*, unsigned
    >     long, bool, unsigned long*, unsigned long,
    >     rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
    >     18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
    >     rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
    >     19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
    >  std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84)
    [0x564dc6b1f644]",
    >     20.
    "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a)
    >     [0x564dc6b2004a]",
    >     21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
    >     22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
    >     23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
    >     24. "clone()"
    >     25. ],
    >
    >
    > I am attaching two instances of crash info for further reference:
    > https://pastebin.com/E6myaHNU
    >
    > OSD configuration is rather simple and close to default:
    >
    > osd.6         dev       bluestore_cache_size_hdd   4294967296
    >                                            osd.6  dev
    > bluestore_cache_size_ssd            4294967296
    >                    osd           advanced  debug_rocksdb
    >    1/5              osd
    >          advanced  osd_max_backfills                   2
    >                                                  osd      basic
    > osd_memory_target                   17179869184
    >                      osd           advanced osd_recovery_max_active
    >      2          osd
    >      advanced  osd_scrub_sleep  0.100000
    >                                        osd  advanced
    >   rbd_balance_parent_reads            false
    >
    > debug_rocksdb is a recent change, otherwise this configuration
    has been
    > running without issues for months. The crashes happened on two
    different
    > hosts with identical hardware, the hosts and storage (NVME
    DB/WAL, HDD
    > block) don't exhibit any issues. We have not experienced such
    crashes with
    > Ceph < 16.2.14.
    >
    > Is this a known issue, or should I open a bug report?
    >
    > Best regards,
    > Zakhar
    > _______________________________________________
    > ceph-users mailing list -- ceph-users@xxxxxxx
    > To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux