Re: Ceph with Cache pool - disk usage / cleanup

Sascha Vogt <sascha.vogt@xxxxxxxxx> · Thu, 29 Sep 2016 13:34:15 +0200

Hi,

Am 29.09.2016 um 02:44 schrieb Christian Balzer:
> I don't think the LOG is keeping the 0-byte files alive, though.
Yeah, don't think so either. The difference did stay at around the same
level.

> In general these are objects that have been evicted from the cache and if
> it's very busy you will wind up with each object that's on the backing
> pool also being present (if just as 0-byte file) in your cache tier.
We have a huge amount of short lived VMs which are deleted before they
are even flushed to the backing pool. Might this be the reason, that
ceph doesn't handle that particular thing well? Eg. when deleting an
object / RBD image which has not been flushed, that the "deletion
mechanism" only deletes whats in the backing pool and if there is
nothing it skips deleting the marker files in the cache pool?

> Similar to objects that get created on the cache-tier (writes) and have
> not been flushed, they will have 0-byte file on the backing pool.
> 
> So that is going to eat up space in a fashion. 
> 
> In your particular case, I'd expect objects that are deleted to be gone,
> maybe with some delay.
> 
> Can you check/verify that the deleted objects are actually gone on the
> backing pool?
I did a rough estimate (no real check object per object), I searched for
0-byte files on HDD OSDs and found them to be around 200-500 objects there.

On the other hand, the SSD OSDs I checked contained between 13 and 14
million 0-byte files. If I count 13000000 * 4kb per Inode I get around
50 GB (x 16 OSDs) we end up at 800 GB which is pretty close to the
amount of space we are missing.

Now the question, where do they come from and which of them are safe for
deletion?

>> Anyway, already thanks for the hint about the log file. We'll keep an
>> eye on that one and try to upgrade to Hammer soon!
>>
> Well you're already on Hammer. ^o^
> Just don't upgrade to 0.94.6, whatever you do (lethal cache tier bug).
Thanks for the hint :) I mixed it up with Jewel and meant that we plan
to upgrade to Jewel soon. But see next paragraph...

> If you don't have too many OSDs (see the various threads here), upgrading
> to 0.94.9 or a to be released .10 which addresses the encoding storms
> should be fine.
We have 20 HDD OSDs and 16 SSD OSDs (8 disks x 2 partitions on the SSDs)
but we have another 4 SSDs (so another 8 OSDs) ready to be installed + a
complete storage node with 5 HDD OSDs and then 6 SSD OSDs bringing the
total to 25 + 30 = 55 OSDs

> At this point in time I think Jewel still has too many rough edges, but
> that's me.
> Take note (search the ML archives) that Jewel massively changes the cache
> tiering behavior (not promoting things as readily as Hammer), so make sure
> you don't get surprised there.
Yes, we saw that there are now two load factors when Ceph softly tries
to flush / evict and a second marker when it does what it currently does.

I will do a search around Jewel and Cache and I think it would be good
to get rid of stale 0-byte files before upgrading.

Greetings
-Sascha-

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com