Re: Space leak in Bluestore

Виталий Филиппов <vitalif@xxxxxxxxxx> · Thu, 26 Mar 2020 01:51:48 +0300

Hi Igor,

I think so because
1) space usage increases after each rebalance. Even when the same pg is moved twice (!)
2) I use 4k min_alloc_size from the beginning

One crazy hypothesis is that maybe ceph allocates space for uncompressed objects, then compresses them and leaks (uncompressed-compressed) space. Really crazy idea but who knows o_O.

I already did a deep fsck, it didn't help... what else could I check?...

26 марта 2020 г. 1:40:52 GMT+03:00, Igor Fedotov <ifedotov@xxxxxxx> пишет:
>Bluestore fsck/repair detect and fix leaks at Bluestore level but I 
>doubt your issue is here.
>
>To be honest I don't understand from the overview why do you think that
>
>there are any leaks at all....
>
>Not sure whether this is relevant but from my experience space "leaks" 
>are sometimes caused by 64K allocation unit and keeping tons of small 
>files or massive small EC overwrites.
>
>To verify if this is applicable you might want to inspect bluestore 
>performance counters (bluestore_stored vs. bluestore_allocated) to 
>estimate your losses due to high allocation units.
>
>Significant difference at multiple OSDs might indicate that overhead is
>
>caused by high allocation granularity. Compression might make this 
>analysis not that simple though...
>
>
>Thanks,
>
>Igor
>
>
>On 3/26/2020 1:19 AM, vitalif@xxxxxxxxxx wrote:
>> I have a question regarding this problem - is it possible to rebuild 
>> bluestore allocation metadata? I could try it to test if it's an 
>> allocator problem...
>>
>>> Hi.
>>>
>>> I'm experiencing some kind of a space leak in Bluestore. I use EC,
>>> compression and snapshots. First I thought that the leak was caused
>by
>>> "virtual clones" (issue #38184). However, then I got rid of most of
>>> the snapshots, but continued to experience the problem.
>>>
>>> I suspected something when I added a new disk to the cluster and
>free
>>> space in the cluster didn't increase (!).
>>>
>>> So to track down the issue I moved one PG (34.1a) using upmaps from
>>> osd11,6,0 to osd6,0,7 and then back to osd11,6,0.
>>>
>>> It ate +59 GB after the first move and +51 GB after the second. As I
>>> understand this proves that it's not #38184. Devirtualizaton of
>>> virtual clones couldn't eat additional space after SECOND rebalance
>of
>>> the same PG.
>>>
>>> The PG has ~39000 objects, it is EC 2+1 and the compression is
>>> enabled. Compression ratio is about ~2.7 in my setup, so the PG
>should
>>> use ~90 GB raw space.
>>>
>>> Before and after moving the PG I stopped osd0, mounted it with
>>> ceph-objectstore-tool with debug bluestore = 20/20 and opened the
>>> 34.1a***/all directory. It seems to dump all object extents into the
>>> log in that case. So now I have two logs with all allocated extents
>>> for osd0 (I hope all extents are there). I parsed both logs and
>added
>>> all compressed blob sizes together ("get_ref Blob ... 0x20000 ->
>0x...
>>> compressed"). But they add up to ~39 GB before first rebalance
>>> (34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the
>second
>>> move (34.1as2) which doesn't indicate a leak.
>>>
>>> But the raw space usage still exceeds initial by a lot. So it's
>clear
>>> that there's a leak somewhere.
>>>
>>> What additional details can I provide for you to identify the bug?
>>>
>>> I posted the same message in the issue tracker,
>>> https://tracker.ceph.com/issues/44731

-- 
With best regards,
  Vitaliy Filippov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx