Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Not sure how it managed to screw up formatting, OSD configuration in a more
readable form: https://pastebin.com/mrC6UdzN

/Z

On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:

> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread:
>
>
>    1. "assert_thread_name": "bstore_kv_sync",
>    2. "backtrace": [
>    3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>    4. "gsignal()",
>    5. "abort()",
>    6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>    const*)+0x1a9) [0x564dc5f87d0b]",
>    7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>    8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>    const&)+0x15e) [0x564dc6604a9e]",
>    9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>    unsigned long)+0x77d) [0x564dc66951cd]",
>    10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>    [0x564dc6695670]",
>    11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>    12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>    13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>    const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>    14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>    [0x564dc6c761c2]",
>    15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
>    16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>    const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>    long)+0x309) [0x564dc6b780c9]",
>    17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>    rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
>    long, bool, unsigned long*, unsigned long,
>    rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>    18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>    rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>    19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>    std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x564dc6b1f644]",
>    20. "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a)
>    [0x564dc6b2004a]",
>    21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>    22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>    23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>    24. "clone()"
>    25. ],
>
>
> I am attaching two instances of crash info for further reference:
> https://pastebin.com/E6myaHNU
>
> OSD configuration is rather simple and close to default:
>
> osd.6         dev       bluestore_cache_size_hdd            4294967296
>                                           osd.6         dev
> bluestore_cache_size_ssd            4294967296
>                   osd           advanced  debug_rocksdb
>   1/5                                                                 osd
>         advanced  osd_max_backfills                   2
>                                                 osd           basic
> osd_memory_target                   17179869184
>                     osd           advanced  osd_recovery_max_active
>     2                                                             osd
>     advanced  osd_scrub_sleep                     0.100000
>                                       osd           advanced
>  rbd_balance_parent_reads            false
>
> debug_rocksdb is a recent change, otherwise this configuration has been
> running without issues for months. The crashes happened on two different
> hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
> block) don't exhibit any issues. We have not experienced such crashes with
> Ceph < 16.2.14.
>
> Is this a known issue, or should I open a bug report?
>
> Best regards,
> Zakhar
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux