Re: allocate_bluefs_freespace failed to allocate

mhnx <morphinwithyou@xxxxxxxxx> · Wed, 10 Nov 2021 19:22:22 +0300

Hello Igor. Thanks for the answer.

There are so many changes to read and test for me but I will plan an
upgrade to Octopus when I'm available.

Is there any problem upgrading from 14.2.16 ---> 15.2.15 ?

Igor Fedotov <igor.fedotov@xxxxxxxx>, 10 Kas 2021 Çar, 17:50 tarihinde şunu
yazdı:

> I would encourage you to upgrade to at least the latest Nautilus (and
> preferably to Octopus).
>
> There were a bunch of allocator's bugs fixed since 14.2.16. Not even
> sure all of them landed into N since it's EOL.
>
> A couple examples are (both are present in the latest Nautilus):
>
> https://github.com/ceph/ceph/pull/41673
>
> https://github.com/ceph/ceph/pull/38475
>
>
> Thanks,
>
> Igor
>
>
> On 11/8/2021 4:31 PM, mhnx wrote:
> > Hello.
> >
> > I'm using Nautilus 14.2.16
> > I have 30 SSD in my cluster and I use them as Bluestore OSD for RGW
> index.
> > Almost every week I'm losing (down) an OSD and when I check osd log I
> see:
> >
> >      -6> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocate
> > failed to allocate 0xf4f04 on bdev 1, free 0xb0000; fallback to bdev
> > 2*
> >      -5> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocate
> > unable to allocate 0xf4f04 on bdev 2, free 0xffffffffffffffff;
> > fallback to slow device expander*
> >      -4> 2021-11-06 19:01:10.854 7fa799989c40 -1
> > bluestore(/var/lib/ceph/osd/ceph-218) *allocate_bluefs_freespace
> > failed to allocate on* 0x80000000 min_size 0x100000 > allocated total
> > 0x0 bluefs_shared_alloc_size 0x10000 allocated 0x0 available 0x
> > a497aab000
> >      -3> 2021-11-06 19:01:10.854 7fa799989c40 -1 *bluefs _allocate
> > failed to expand slow device to fit +0xf4f04*
> >
> >
> > Full log: https://paste.ubuntu.com/p/MpJfVjMh7V/plain/
> >
> > And OSD does not start without offline compaction.
> > Offline compaction log: https://paste.ubuntu.com/p/vFZcYnxQWh/plain/
> >
> > After the Offline compaction I tried to start OSD with bitmap allocator
> but
> > it is not getting up because of " FAILED ceph_assert(available >=
> > allocated)"
> > Log: https://paste.ubuntu.com/p/2Bbx983494/plain/
> >
> > Then I start the OSD with hybrid allocator and let it recover.
> > When the recover is done I stop the OSD and start with the bitmap
> > allocator.
> > This time it came up but I've got "80 slow ops, oldest one blocked for
> 116
> > sec, osd.218 has slow ops" and I increased "osd_recovery_sleep 10" to
> give
> > a breath to cluster and cluster marked the osd as down (it was still
> > working) after a while the osd marked up and cluster became normal. But
> > while recovering, other osd's started to give slow ops and I've played
> > around with "osd_recovery_sleep 0.1 <---> 10" to keep the cluster stable
> > till recovery finishes.
> >
> > Ceph osd df tree before: https://paste.ubuntu.com/p/4K7JXcZ8FJ/plain/
> > Ceph osd df tree after osd.218 = bitmap:
> > https://paste.ubuntu.com/p/5SKbhrbgVM/plain/
> >
> > If I want to change all other osd's allocator to bitmap, I need to repeat
> > the process 29 time and it will take too much time.
> > I don't want to heal OSDs with the offline compaction anymore so I will
> do
> > that if that's the solution but I want to be sure before doing a lot of
> > work and maybe with the issue I can provide helpful logs and information
> > for developers.
> >
> > Have a nice day.
> > Thanks.
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx