Hi Reed,
it looks to me like your settings aren't effective. You might want to
check OSD log rather than crash info and see the assertion's backtrace.
Does it mention RocksDBBlueFSVolumeSelector as the one in
https://tracker.ceph.com/issues/53906:
ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
1: /lib64/libpthread.so.0(+0x12c20) [0x7f2beb318c20]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x56347eb33bec]
5: /usr/bin/ceph-osd(+0x5d5daf) [0x56347eb33daf]
6: (RocksDBBlueFSVolumeSelector::add_usage(void*, bluefs_fnode_t const&)+0) [0x56347f1f7d00]
7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x56347f295b45]
If so - then there is still a mess with proper parameter changes.
Thanks
Igor
On 10/01/2024 20:13, Reed Dier wrote:
Well, sadly, that setting doesn’t seem to resolve the issue.
I set the value in ceph.conf for the OSDs with small WAL/DB devices that keep running into the issue,
$ ceph tell osd.12 config show | grep bluestore_volume_selection_policy
"bluestore_volume_selection_policy": "rocksdb_original",
$ ceph crash info 2024-01-10T16:39:05.925534Z_f0c57ca3-b7e6-4511-b7ae-5834541d6c67 | egrep "(assert_condition|entity_name)"
"assert_condition": "cur >= p.length",
"entity_name": "osd.12",
So, I guess that configuration item doesn’t in fact prevent the crash as was purported.
Looks like I may need to fast track moving to quincy…
Reed
On Jan 8, 2024, at 9:47 AM, Reed Dier<reed.dier@xxxxxxxxxxx> wrote:
I ended up setting it in ceph.conf which appears to have worked (as far as I can tell).
[osd]
bluestore_volume_selection_policy = rocksdb_original
$ ceph config show osd.0 | grep bluestore_volume_selection_policy
bluestore_volume_selection_policy rocksdb_original file (mon[rocksdb_original])
So far so good…
Reed
On Jan 8, 2024, at 2:04 AM, Eugen Block <eblock@xxxxxx <mailto:eblock@xxxxxx>> wrote:
Hi,
I just did the same in my lab environment and the config got applied to the daemon after a restart:
pacific:~ # ceph tell osd.0 config show | grep bluestore_volume_selection_policy
"bluestore_volume_selection_policy": "rocksdb_original",
This is also a (tiny single-node) cluster running 16.2.14. Maybe you have some typo or something while doing the loop? Have you tried to set it for one OSD only and see if it starts with the config set?
Zitat von Reed Dier <reed.dier@xxxxxxxxxxx <mailto:reed.dier@xxxxxxxxxxx>>:
After ~3 uneventful weeks after upgrading from 15.2.17 to 16.2.14 I’ve started seeing OSD crashes with "cur >= fnode.size” and "cur >= p.length”, which seems to be resolved in the next point release for pacific later this month, but until then, I’d love to keep the OSDs from flapping.
$ for crash in $(ceph crash ls | grep osd | awk '{print $1}') ; do ceph crash info $crash | egrep "(assert_condition|crash_id)" ; done
"assert_condition": "cur >= fnode.size",
"crash_id": "2024-01-03T09:07:55.698213Z_348af2d3-d4a7-4c27-9f71-70e6dc7c1af7",
"assert_condition": "cur >= p.length",
"crash_id": "2024-01-03T14:21:55.794692Z_4557c416-ffca-4165-aa91-d63698d41454",
"assert_condition": "cur >= fnode.size",
"crash_id": "2024-01-03T22:53:43.010010Z_15dc2b2a-30fb-4355-84b9-2f9560f08ea7",
"assert_condition": "cur >= p.length",
"crash_id": "2024-01-04T02:34:34.408976Z_2954a2c2-25d2-478e-92ad-d79c42d3ba43",
"assert_condition": "cur2 >= p.length",
"crash_id": "2024-01-04T21:57:07.100877Z_12f89c2c-4209-4f5a-b243-f0445ba629d2",
"assert_condition": "cur >= p.length",
"crash_id": "2024-01-05T00:35:08.561753Z_a189d967-ab02-4c61-bf68-1229222fd259",
"assert_condition": "cur >= fnode.size",
"crash_id": "2024-01-05T04:11:48.625086Z_a598cbaf-2c4f-4824-9939-1271eeba13ea",
"assert_condition": "cur >= p.length",
"crash_id": "2024-01-05T13:49:34.911210Z_953e38b9-8ae4-4cfe-8f22-d4b7cdf65cea",
"assert_condition": "cur >= p.length",
"crash_id": "2024-01-05T13:54:25.732770Z_4924b1c0-309c-4471-8c5d-c3aaea49166c",
"assert_condition": "cur >= p.length",
"crash_id": "2024-01-05T16:35:16.485416Z_0bca3d2a-2451-4275-a049-a65c58c1aff1”,
As noted inhttps://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/ <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/> <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/ <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/>>
You can apparently work around the issue by setting
'bluestore_volume_selection_policy' config parameter to rocksdb_original.
However, after trying to set that parameter with `ceph config set osd.$osd bluestore_volume_selection_policy rocksdb_original` it doesn’t appear to set?
$ ceph config show-with-defaults osd.0 | grep bluestore_volume_selection_policy
bluestore_volume_selection_policy use_some_extra
$ ceph config set osd.0 bluestore_volume_selection_policy rocksdb_original
$ ceph config show osd.0 | grep bluestore_volume_selection_policy
bluestore_volume_selection_policy use_some_extra default mom
This, I assume, should reflect the new setting, however it still shows the default “use_some_extra” value.
But then this seems to imply that the config is set?
$ ceph config dump | grep bluestore_volume_selection_policy
osd.0 dev bluestore_volume_selection_policy rocksdb_original *
[snip]
osd.9 dev bluestore_volume_selection_policy rocksdb_original *
Does this need to be set in ceph.conf or is there another setting that also needs to be set?
Even after bouncing the OSD daemon, `ceph config show` still reports “use_some_extra"
Appreciate any help they can offer to point me towards to bridge the gap between now and the next point release.
Thanks,
Reed
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email toceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email toceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx