Re: Ceph with Cache pool - disk usage / cleanup

Christian Balzer <chibi@xxxxxxx> · Fri, 30 Sep 2016 16:45:42 +0900

I just love the sound of my own typing...

See inline, below.

On Fri, 30 Sep 2016 12:18:48 +0900 Christian Balzer wrote:

> 
> Hello,
> 
> On Thu, 29 Sep 2016 20:15:12 +0200 Sascha Vogt wrote:
> 
> > Hi Burkhard,
> > 
> > On 29/09/16 15:08, Burkhard Linke wrote:
> > > AFAIK evicting an object also flushes it to the backing storage, so
> > > evicting a live object should be ok. It will be promoted again at the
> > > next access (or whatever triggers promotion in the caching mechanism).
> > >>
> > >> For the dead 0-byte files: Should I open a bug report?
> > > Not sure whether this is a bug at all. The objects should be evicted and
> > > removed if the cache pool hits the max object thresholds.
> > d'oh, Ceph and it's hidden gems ;) That was it. 
> 
> That's what I alluding to when I wrote "maybe with some delay".
> 
> >Yes, we have currently 
> > no hard object limit (target_max_objects) as we have target_max_bytes 
> > set and thought that would be enough. After setting target_max_objects 
> > (even to a ridiculous high number, I used 200 millions, so double the 
> > amount we have) and Ceph immediately started dropping objects (and 
> > blocking all client IO :( )
> > 
> Please refer to this page for the reminder:
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/
> 
> So, firstly, what are your ratios (dirty, full) set to?
> If it's at the defaults of 0.4 and 0.6 and you REALLY have only 100
> million objects, it should have started to flush stuff (which is likely a
> NOOP with these leftovers) and not evict stuff.
> What does "ceph df detail" tell you?
> 
> Are you sure the blocking of client I/O is due to the object removal and
> your OSDs being too busy and not actually because Ceph thinks that the
> cache is full (object wise)?
> As in:
> "Note All client requests will be blocked only when target_max_bytes or
> target_max_objects reached"
> 
> 
> > Is this behavior documented somewhere? 
> 
> Not that I'm aware of. 
> OTOH, I'd expect even those 0-byte files/objects to be eventually the
> subject of removal when the space/size limits are reached and they are
> eligible (old enough).
> If that is NOT the case, that this is both a bug and at the very least
> needs to be put into the documentation.
>

Gotta love having (only a few years late) a test and staging cluster that
is actually usable and comparable to my real ones.

So I did create a 500GB image and filled it up. 
The cache pool is set to 500GB as well and will flush at 60% and evict
at 80%.
Afterwards I rm'ed the image and had plenty of those orphan objects left
in the cache pool.
Both the ones created initially AND the ones moved back up to it from the
base pool during the removal (all activity happens on the cache tier after
all). 

Repeated that 2 more times and with the flush and evict timers set to 10
and 20 minutes respectively it should have removed those, but it didn't.

Started like this:
---
    NAME      ID     CATEGORY     USED       %USED     MAX AVAIL     OBJECTS     DIRTY     READ      WRITE     RAW USED 
    rbd       0      -            23445M      0.52         3579G        5951      5951      478k     3841k       46890M 
    cache     2      -            11587M      0.26          661G       15778      2821     77955     2522k       23174M 
---

and ended up like that:
---
    NAME      ID     CATEGORY     USED      %USED     MAX AVAIL     OBJECTS     DIRTY     READ      WRITE     RAW USED 
    rbd       0      -             245G      5.61         3328G       63015     63015      505k     3953k         490G 
    cache     2      -            3498M      0.08          669G      291626      213k     80552     7995k        6996M 
---

Set max objects to 200k and that got rid of many (no particular death
throes were caused by this), but still left 150k floating around.

To remove the remaining ones (and of course clean out the cache entirely)
a "rados -p cache cache-try-flush-evict-all" did the trick.
Which is of course impractical in a production environment.

So yeah, it's definitely a bug as these orphans will never expire it
seems.
And at the very least the documentation would need to reflect this.

Christian
> >From the cache tiering doc it 
> > looked like you either set target_max_bytes OR target_max_objects and 
> > not both (although I always wondered what sense does it make to talk 
> > about objects on a cache layer, as it's nature is that it is space bound 
> > and it is less than the backing pool. I even wondered why 
> > target_max_bytes is even necessary, as Ceph knows how much space is 
> > available. 
> 
> As it says on that page, it doesn't.
> Ceph is notoriously bad (due to the potential complexity of setups) to
> figure out what space is actually available and used.
> 
> This is isn't helped by cache-tiering basing things on PGs not pools or
> OSDs or anything else that would help to make sizing guesses.
> See my old "Cache tier operation clarifications" thread here.
> 
> Christian
> 
> > I mean optionally restricting it further is ok, in case you 
> > want to have two Cache pools on the same set of fast disks / SSds, but 
> > IMHO it could be optionally and in case of just one pool use whats there)
> > 
> > Anyway, thanks a lot for the help. We will see how we can get some 
> > downtime in order to set a limit and cleanup the backlog of stale 
> > objects from the cache.
> > 
> > Greetings
> > -Sascha-
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com