Re: Ceph with Cache pool - disk usage / cleanup

Sascha Vogt <sascha.vogt@xxxxxxxxx> · Fri, 30 Sep 2016 09:38:15 +0200

Am 30.09.2016 um 05:18 schrieb Christian Balzer:
> On Thu, 29 Sep 2016 20:15:12 +0200 Sascha Vogt wrote:
>> On 29/09/16 15:08, Burkhard Linke wrote:
>>> AFAIK evicting an object also flushes it to the backing storage, so
>>> evicting a live object should be ok. It will be promoted again at the
>>> next access (or whatever triggers promotion in the caching mechanism).
>>>>
>>>> For the dead 0-byte files: Should I open a bug report?
>>> Not sure whether this is a bug at all. The objects should be evicted and
>>> removed if the cache pool hits the max object thresholds.
>> d'oh, Ceph and it's hidden gems ;) That was it. 
> 
> That's what I alluding to when I wrote "maybe with some delay".
Unfortunately it wasn't a time based delay, so I didn't immediately
connect it ;)

>> Yes, we have currently 
>> no hard object limit (target_max_objects) as we have target_max_bytes 
>> set and thought that would be enough. After setting target_max_objects 
>> (even to a ridiculous high number, I used 200 millions, so double the 
>> amount we have) and Ceph immediately started dropping objects (and 
>> blocking all client IO :( )
>>
> Please refer to this page for the reminder:
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/
> 
> So, firstly, what are your ratios (dirty, full) set to?
> If it's at the defaults of 0.4 and 0.6 and you REALLY have only 100
> million objects, it should have started to flush stuff (which is likely a
> NOOP with these leftovers) and not evict stuff.
> What does "ceph df detail" tell you?

The current cache parameters we use are:
target_max_bytes: 4913442586624 (4576 GB x 2 (replications): 9152 GB)
target_max_objects: 0
cache_target_full_ratio: 0.9
cache_target_dirty_ratio: 0.8
cache_min_flush_age: 10800 (3 hours)
cache_min_evict_age: 86400 (24 hours)

As I mentioned earlier we have lots of very short running VMs. Those are
automated tests and usually are destroyed within 2-3 hours (hence the
min flush age). A few VMs are longer living and used for manual tests,
so we set the target dirty ratio quite high. 0.8 might be a bit high,
initially we thought maybe 0.6 is a good value, so we're playing with
that value.

> Are you sure the blocking of client I/O is due to the object removal and
> your OSDs being too busy and not actually because Ceph thinks that the
> cache is full (object wise)?
> As in:
> "Note All client requests will be blocked only when target_max_bytes or
> target_max_objects reached"

ceph df detail output from this morning:
> Every 2.0s: ceph df detail                                                                    Fri Sep 30 08:38:34 2016
> 
> GLOBAL:
>     SIZE       AVAIL      RAW USED     %RAW USED     OBJECTS
>     85818G     48966G       36851G         42.94        103M
> POOLS:
>     NAME               ID     CATEGORY     USED      %USED     MAX AVAIL     OBJECTS       DIRTY      READ      WRITE
>     cinder-volumes     1      -            3216G      3.75        13376G        828187       808k      407M      1297M
>     ephemeral-vms      2      -            4401G      5.13        20064G        910880       889k     7981M     20914M
>     glance-images      3      -            3573G      4.16        13376G        491346       479k      917M      3913k
>     swift              4      -                0         0        13376G             0          0         0          0
>     ssd                11     -            3401G      3.96         1425G     106329471     24543k     8369M     14248M

So you can see we have around 900k objects in the ephemeral-vms and 106
million in the ssd pool (cache in front of ephemeral). Replication size
is 2 no ereasure coding.

Current disc sizes of the OSDs:
> --- Fri Sep 30 08:38:15 CEST 2016 ---------------------------------------
> /dev/nvme1n1p1  715G  510G  205G  72% /var/lib/ceph/osd/ceph-20          
> /dev/nvme1n1p3  715G  465G  251G  65% /var/lib/ceph/osd/ceph-21          
> /dev/nvme2n1p1  715G  431G  285G  61% /var/lib/ceph/osd/ceph-22          
> /dev/nvme2n1p3  715G  501G  214G  71% /var/lib/ceph/osd/ceph-23          
> /dev/nvme1n1p1  715G  463G  253G  65% /var/lib/ceph/osd/ceph-24          
> /dev/nvme1n1p3  715G  455G  261G  64% /var/lib/ceph/osd/ceph-25          
> /dev/nvme2n1p1  715G  478G  238G  67% /var/lib/ceph/osd/ceph-26          
> /dev/nvme2n1p3  715G  518G  198G  73% /var/lib/ceph/osd/ceph-27          
> /dev/nvme1n1p1  715G  484G  231G  68% /var/lib/ceph/osd/ceph-28          
> /dev/nvme1n1p3  715G  472G  244G  66% /var/lib/ceph/osd/ceph-29          
> /dev/nvme2n1p1  715G  406G  310G  57% /var/lib/ceph/osd/ceph-30          
> /dev/nvme2n1p3  715G  537G  179G  76% /var/lib/ceph/osd/ceph-31          
> /dev/nvme1n1p1  715G  479G  237G  67% /var/lib/ceph/osd/ceph-32          
> /dev/nvme1n1p3  715G  482G  234G  68% /var/lib/ceph/osd/ceph-33          
> /dev/nvme2n1p1  715G  519G  197G  73% /var/lib/ceph/osd/ceph-34          
> /dev/nvme2n1p3  715G  471G  244G  66% /var/lib/ceph/osd/ceph-35          
> Sum:          11440G 7671G 3781G                              

When setting target_max_objects to 150000000 (150 millions) the number
of objects in ceph df detail starts dropping immediately (yesterday it
dropped 5 million objects in around 10-15 minutes) although 150 millions
is way above the target_full_ratio. And after 30 seconds the cluster
status went to WARN because client requests where blocked > 30s - the
number of blocked requests was increasing until I set the
target_max_objects back to 0, where the objects number stopped dropping.

>> Is this behavior documented somewhere? 
> 
> Not that I'm aware of. 
> OTOH, I'd expect even those 0-byte files/objects to be eventually the
> subject of removal when the space/size limits are reached and they are
> eligible (old enough).
> If that is NOT the case, that this is both a bug and at the very least
> needs to be put into the documentation.
See above

> [...]
> This is isn't helped by cache-tiering basing things on PGs not pools or
> OSDs or anything else that would help to make sizing guesses.
> See my old "Cache tier operation clarifications" thread here.
I will look that one up now :)

Thank you all (especially Chrisitan and Burkhard) for all the help and
clarifications! Very much appreciated.

Greetings
-Sascha-

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com