Re: Space leak in Bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Steve,

Thanks, it's an interesting discussion, however I don't think that it's the same problem, because in my case bluestore eats additional space during rebalance. And it doesn't seem that Ceph does small overwrites during rebalance. As I understand it does the opposite: it reads and writes the whole object... Also I have bluestore_min_alloc_size set to 4K from the beginning and Igor says that it works around that bug... bug-o-feature. :D

Hi Vitaliy,

You may be coming across the EC space amplification issue,
https://tracker.ceph.com/issues/44213

I am not aware of any recent updates to resolve this issue.

Sincerely,

On Tue, Mar 24, 2020 at 12:53 PM <vitalif@xxxxxxxxxx> wrote:

Hi.

I'm experiencing some kind of a space leak in Bluestore. I use EC,
compression and snapshots. First I thought that the leak was caused
by
"virtual clones" (issue #38184). However, then I got rid of most of
the
snapshots, but continued to experience the problem.

I suspected something when I added a new disk to the cluster and
free
space in the cluster didn't increase (!).

So to track down the issue I moved one PG (34.1a) using upmaps from
osd11,6,0 to osd6,0,7 and then back to osd11,6,0.

It ate +59 GB after the first move and +51 GB after the second. As I

understand this proves that it's not #38184. Devirtualizaton of
virtual
clones couldn't eat additional space after SECOND rebalance of the
same
PG.

The PG has ~39000 objects, it is EC 2+1 and the compression is
enabled.
Compression ratio is about ~2.7 in my setup, so the PG should use
~90 GB
raw space.

Before and after moving the PG I stopped osd0, mounted it with
ceph-objectstore-tool with debug bluestore = 20/20 and opened the
34.1a***/all directory. It seems to dump all object extents into the
log
in that case. So now I have two logs with all allocated extents for
osd0
(I hope all extents are there). I parsed both logs and added all
compressed blob sizes together ("get_ref Blob ... 0x20000 -> 0x...
compressed"). But they add up to ~39 GB before first rebalance
(34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the
second
move (34.1as2) which doesn't indicate a leak.

But the raw space usage still exceeds initial by a lot. So it's
clear
that there's a leak somewhere.

What additional details can I provide for you to identify the bug?

I posted the same message in the issue tracker,
https://tracker.ceph.com/issues/44731

--
Vitaliy Filippov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--

Steven Pine

		webair.com [1]

		P  516.938.4100 x

 E  steven.pine@xxxxxxxxxx

		   [2]  [3]



Links:
------
[1] http://webair.com
[2] https://www.facebook.com/WebairInc/
[3] https://www.linkedin.com/company/webair
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux