Re: allocate_bluefs_freespace failed to allocate

prosergey07 <prosergey07@xxxxxxxxx> · Tue, 09 Nov 2021 02:02:34 +0200



Are those problematic OSDs getting almost full ? I do not have Ubuntu account to check their pastebin.Надіслано з пристрою Galaxy
-------- Оригінальне повідомлення --------Від: mhnx <morphinwithyou@xxxxxxxxx> Дата: 08.11.21  15:31  (GMT+02:00) Кому: Ceph Users <ceph-users@xxxxxxx> Тема:  allocate_bluefs_freespace failed to allocate Hello.I'm using Nautilus 14.2.16I have 30 SSD in my cluster and I use them as Bluestore OSD for RGW index.Almost every week I'm losing (down) an OSD and when I check osd log I see:    -6> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocatefailed to allocate 0xf4f04 on bdev 1, free 0xb0000; fallback to bdev2*    -5> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocateunable to allocate 0xf4f04 on bdev 2, free 0xffffffffffffffff;fallback to slow device expander*    -4> 2021-11-06 19:01:10.854 7fa799989c40 -1bluestore(/var/lib/ceph/osd/ceph-218) *allocate_bluefs_freespacefailed to allocate on* 0x80000000 min_size 0x100000 > allocated total0x0 bluefs_shared_alloc_size 0x10000 allocated 0x0 available 0xa497aab000    -3> 2021-11-06 19:01:10.854 7fa799989c40 -1 *bluefs _allocatefailed to expand slow device to fit +0xf4f04*Full log: https://paste.ubuntu.com/p/MpJfVjMh7V/plain/And OSD does not start without offline compaction.Offline compaction log: https://paste.ubuntu.com/p/vFZcYnxQWh/plain/After the Offline compaction I tried to start OSD with bitmap allocator butit is not getting up because of " FAILED ceph_assert(available >=allocated)"Log: https://paste.ubuntu.com/p/2Bbx983494/plain/Then I start the OSD with hybrid allocator and let it recover.When the recover is done I stop the OSD and start with the bitmapallocator.This time it came up but I've got "80 slow ops, oldest one blocked for 116sec, osd.218 has slow ops" and I increased "osd_recovery_sleep 10" to givea breath to cluster and cluster marked the osd as down (it was stillworking) after a while the osd marked up and cluster became normal. Butwhile recovering, other osd's started to give slow ops and I've playedaround with "osd_recovery_sleep 0.1 <---> 10" to keep the cluster stabletill
 recovery finishes.Ceph osd df tree before: https://paste.ubuntu.com/p/4K7JXcZ8FJ/plain/Ceph osd df tree after osd.218 = bitmap:https://paste.ubuntu.com/p/5SKbhrbgVM/plain/If I want to change all other osd's allocator to bitmap, I need to repeatthe process 29 time and it will take too much time.I don't want to heal OSDs with the offline compaction anymore so I will dothat if that's the solution but I want to be sure before doing a lot ofwork and maybe with the issue I can provide helpful logs and informationfor developers.Have a nice day.Thanks._______________________________________________ceph-users mailing list -- ceph-users@xxxxxxxxx unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx