Hi Reed,
no much sense to attach the logs to the mentioned tickets - the problem
with the assertion is well-known and has been already fixed.
Your current issue is weird config update behavior which prevents from
applying the work around. Feel free to open ticket about that but I
don't think it's an efficient way - IIUC the problem isn't common and
likely caused by something specific to your setup. Which rather means
the fix wouldn't appear soon enough. Unfortunately that's not my area of
expertise either so I'm of little help here as well.
Nevertheless If I troubleshoot this config update issue I'd start the
investigation by trying different parameters/daemons/hosts. Are you able
to tune any parameter at all? Is it doable at different host or OSD?
Not to mention that you might just try to restart monitors first ;)
Thanks,
Igor
On 10/01/2024 21:38, Reed Dier wrote:
Hi Igor,
That’s correct (shown below).
Would it be helpful for me to add logs/uploaded crash UUID’s to 53906
<https://tracker.ceph.com/issues/53906>, 53907
<https://tracker.ceph.com/issues/53907>, 54209
<https://tracker.ceph.com/issues/54209>, 62928
<https://tracker.ceph.com/issues/62928>, 63110
<https://tracker.ceph.com/issues/63110>, 63161
<https://tracker.ceph.com/issues/63161>, 63352
<https://tracker.ceph.com/issues/63352>?
Or maybe open a new tracker to track that the parameter change isn’t
being properly persisted or whatever appears to be happening?
Thanks,
Reed
/build/ceph-16.2.14/src/os/bluestore/BlueStore.h: 3870: FAILED
ceph_assert(cur >= p.length)
ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9)
pacific (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x55d51970a987]
2: /usr/bin/ceph-osd(+0xad3b8f) [0x55d51970ab8f]
3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
const&)+0x112) [0x55d519e040f2]
4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
unsigned long)+0x69d) [0x55d519ea0fad]
5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xaa)
[0x55d519ea14ea]
6: (BlueFS::fsync(BlueFS::FileWriter*)+0x7d) [0x55d519ec61ed]
7: (BlueRocksWritableFile::Sync()+0x19) [0x55d519ed5a59]
8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
const&, rocksdb::IODebugContext*)+0x52) [0x55d51a3e37ce]
9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x216)
[0x55d51a5eddac]
10: (rocksdb::WritableFileWriter::Sync(bool)+0x17b) [0x55d51a5ed785]
11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
long)+0x39a) [0x55d51a441bf8]
12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
unsigned long, bool, unsigned long*, unsigned long,
rocksdb::PreReleaseCallback*)+0x135e) [0x55d51a43d96c]
13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
rocksdb::WriteBatch*)+0x5d) [0x55d51a43c56f]
14: (RocksDBStore::submit_common(rocksdb::WriteOptions&,
std::shared_ptr<KeyValueDB::TransactionImpl>)+0x85) [0x55d51a388635]
15:
(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9b)
[0x55d51a38904b]
16: (BlueStore::_kv_sync_thread()+0x22bc) [0x55d519e016dc]
17: (BlueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71]
18: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609]
19: clone()
0> 2024-01-10T11:39:05.922-0500 7f48f978d700 -1 *** Caught
signal (Aborted) **
in thread 7f48f978d700 thread_name:bstore_kv_sync
ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9)
pacific (stable)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f490cf2f420]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1ad) [0x55d51970a9e2]
5: /usr/bin/ceph-osd(+0xad3b8f) [0x55d51970ab8f]
6: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
const&)+0x112) [0x55d519e040f2]
7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
unsigned long)+0x69d) [0x55d519ea0fad]
8: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xaa)
[0x55d519ea14ea]
9: (BlueFS::fsync(BlueFS::FileWriter*)+0x7d) [0x55d519ec61ed]
10: (BlueRocksWritableFile::Sync()+0x19) [0x55d519ed5a59]
11: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
const&, rocksdb::IODebugContext*)+0x52) [0x55d51a3e37ce]
12: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x216)
[0x55d51a5eddac]
13: (rocksdb::WritableFileWriter::Sync(bool)+0x17b) [0x55d51a5ed785]
14: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
long)+0x39a) [0x55d51a441bf8]
15: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
unsigned long, bool, unsigned long*, unsigned long,
rocksdb::PreReleaseCallback*)+0x135e) [0x55d51a43d96c]
16: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
rocksdb::WriteBatch*)+0x5d) [0x55d51a43c56f]
17: (RocksDBStore::submit_common(rocksdb::WriteOptions&,
std::shared_ptr<KeyValueDB::TransactionImpl>)+0x85) [0x55d51a388635]
18:
(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9b)
[0x55d51a38904b]
19: (BlueStore::_kv_sync_thread()+0x22bc) [0x55d519e016dc]
20: (BlueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71]
21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609]
22: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
On Jan 10, 2024, at 12:06 PM, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
Hi Reed,
it looks to me like your settings aren't effective. You might want to
check OSD log rather than crash info and see the assertion's backtrace.
Does it mention RocksDBBlueFSVolumeSelector as the one in
https://tracker.ceph.com/issues/53906:
ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
1: /lib64/libpthread.so.0(+0x12c20) [0x7f2beb318c20]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x56347eb33bec]
5: /usr/bin/ceph-osd(+0x5d5daf) [0x56347eb33daf]
6: (RocksDBBlueFSVolumeSelector::add_usage(void*, bluefs_fnode_t const&)+0) [0x56347f1f7d00]
7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x56347f295b45]
If so - then there is still a mess with proper parameter changes.
Thanks
Igor
On 10/01/2024 20:13, Reed Dier wrote:
Well, sadly, that setting doesn’t seem to resolve the issue.
I set the value in ceph.conf for the OSDs with small WAL/DB devices that keep running into the issue,
$ ceph tell osd.12 config show | grep bluestore_volume_selection_policy
"bluestore_volume_selection_policy": "rocksdb_original",
$ ceph crash info 2024-01-10T16:39:05.925534Z_f0c57ca3-b7e6-4511-b7ae-5834541d6c67 | egrep "(assert_condition|entity_name)"
"assert_condition": "cur >= p.length",
"entity_name": "osd.12",
So, I guess that configuration item doesn’t in fact prevent the crash as was purported.
Looks like I may need to fast track moving to quincy…
Reed
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx