allocate_bluefs_freespace failed to allocate

mhnx <morphinwithyou@xxxxxxxxx> · Mon, 8 Nov 2021 16:31:01 +0300

Hello.

I'm using Nautilus 14.2.16
I have 30 SSD in my cluster and I use them as Bluestore OSD for RGW index.
Almost every week I'm losing (down) an OSD and when I check osd log I see:

    -6> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocate
failed to allocate 0xf4f04 on bdev 1, free 0xb0000; fallback to bdev
2*
    -5> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocate
unable to allocate 0xf4f04 on bdev 2, free 0xffffffffffffffff;
fallback to slow device expander*
    -4> 2021-11-06 19:01:10.854 7fa799989c40 -1
bluestore(/var/lib/ceph/osd/ceph-218) *allocate_bluefs_freespace
failed to allocate on* 0x80000000 min_size 0x100000 > allocated total
0x0 bluefs_shared_alloc_size 0x10000 allocated 0x0 available 0x
a497aab000
    -3> 2021-11-06 19:01:10.854 7fa799989c40 -1 *bluefs _allocate
failed to expand slow device to fit +0xf4f04*

Full log: https://paste.ubuntu.com/p/MpJfVjMh7V/plain/

And OSD does not start without offline compaction.
Offline compaction log: https://paste.ubuntu.com/p/vFZcYnxQWh/plain/

After the Offline compaction I tried to start OSD with bitmap allocator but
it is not getting up because of " FAILED ceph_assert(available >=
allocated)"
Log: https://paste.ubuntu.com/p/2Bbx983494/plain/

Then I start the OSD with hybrid allocator and let it recover.
When the recover is done I stop the OSD and start with the bitmap
allocator.
This time it came up but I've got "80 slow ops, oldest one blocked for 116
sec, osd.218 has slow ops" and I increased "osd_recovery_sleep 10" to give
a breath to cluster and cluster marked the osd as down (it was still
working) after a while the osd marked up and cluster became normal. But
while recovering, other osd's started to give slow ops and I've played
around with "osd_recovery_sleep 0.1 <---> 10" to keep the cluster stable
till recovery finishes.

Ceph osd df tree before: https://paste.ubuntu.com/p/4K7JXcZ8FJ/plain/
Ceph osd df tree after osd.218 = bitmap:
https://paste.ubuntu.com/p/5SKbhrbgVM/plain/

If I want to change all other osd's allocator to bitmap, I need to repeat
the process 29 time and it will take too much time.
I don't want to heal OSDs with the offline compaction anymore so I will do
that if that's the solution but I want to be sure before doing a lot of
work and maybe with the issue I can provide helpful logs and information
for developers.

Have a nice day.
Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx