Pacific bluestore_volume_selection_policy

Reed Dier <reed.dier@xxxxxxxxxxx> · Fri, 5 Jan 2024 11:04:20 -0600

After ~3 uneventful weeks after upgrading from 15.2.17 to 16.2.14 I’ve started seeing OSD crashes with "cur >= fnode.size” and "cur >= p.length”, which seems to be resolved in the next point release for pacific later this month, but until then, I’d love to keep the OSDs from flapping.

> $ for crash in $(ceph crash ls | grep osd | awk '{print $1}') ; do ceph crash info $crash | egrep "(assert_condition|crash_id)" ; done
>     "assert_condition": "cur >= fnode.size",
>     "crash_id": "2024-01-03T09:07:55.698213Z_348af2d3-d4a7-4c27-9f71-70e6dc7c1af7",
>     "assert_condition": "cur >= p.length",
>     "crash_id": "2024-01-03T14:21:55.794692Z_4557c416-ffca-4165-aa91-d63698d41454",
>     "assert_condition": "cur >= fnode.size",
>     "crash_id": "2024-01-03T22:53:43.010010Z_15dc2b2a-30fb-4355-84b9-2f9560f08ea7",
>     "assert_condition": "cur >= p.length",
>     "crash_id": "2024-01-04T02:34:34.408976Z_2954a2c2-25d2-478e-92ad-d79c42d3ba43",
>     "assert_condition": "cur2 >= p.length",
>     "crash_id": "2024-01-04T21:57:07.100877Z_12f89c2c-4209-4f5a-b243-f0445ba629d2",
>     "assert_condition": "cur >= p.length",
>     "crash_id": "2024-01-05T00:35:08.561753Z_a189d967-ab02-4c61-bf68-1229222fd259",
>     "assert_condition": "cur >= fnode.size",
>     "crash_id": "2024-01-05T04:11:48.625086Z_a598cbaf-2c4f-4824-9939-1271eeba13ea",
>     "assert_condition": "cur >= p.length",
>     "crash_id": "2024-01-05T13:49:34.911210Z_953e38b9-8ae4-4cfe-8f22-d4b7cdf65cea",
>     "assert_condition": "cur >= p.length",
>     "crash_id": "2024-01-05T13:54:25.732770Z_4924b1c0-309c-4471-8c5d-c3aaea49166c",
>     "assert_condition": "cur >= p.length",
>     "crash_id": "2024-01-05T16:35:16.485416Z_0bca3d2a-2451-4275-a049-a65c58c1aff1”,

As noted in https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/ <https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/>

> You can apparently work around the issue by setting 
> 'bluestore_volume_selection_policy' config parameter to rocksdb_original.

However, after trying to set that parameter with `ceph config set osd.$osd bluestore_volume_selection_policy rocksdb_original` it doesn’t appear to set?

> $ ceph config show-with-defaults osd.0  | grep bluestore_volume_selection_policy
> bluestore_volume_selection_policy                           use_some_extra

> $ ceph config set osd.0 bluestore_volume_selection_policy rocksdb_original
> $ ceph config show osd.0  | grep bluestore_volume_selection_policy
> bluestore_volume_selection_policy   use_some_extra                    default                 mom

This, I assume, should reflect the new setting, however it still shows the default “use_some_extra” value.

But then this seems to imply that the config is set?
> $ ceph config dump | grep bluestore_volume_selection_policy
>     osd.0                dev       bluestore_volume_selection_policy       rocksdb_original                                              *
> [snip]
>     osd.9                dev       bluestore_volume_selection_policy       rocksdb_original                                              *

Does this need to be set in ceph.conf or is there another setting that also needs to be set?
Even after bouncing the OSD daemon, `ceph config show` still reports “use_some_extra"

Appreciate any help they can offer to point me towards to bridge the gap between now and the next point release.

Thanks,
Reed
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx