Re: Ceph with Cache pool - disk usage / cleanup

Sascha Vogt <sascha.vogt@xxxxxxxxx> · Fri, 30 Sep 2016 11:42:02 +0200

Hi,

Am 30.09.2016 um 09:45 schrieb Christian Balzer:
> [...] 
> Gotta love having (only a few years late) a test and staging cluster that
> is actually usable and comparable to my real ones.
> 
> So I did create a 500GB image and filled it up. 
> The cache pool is set to 500GB as well and will flush at 60% and evict
> at 80%.
> Afterwards I rm'ed the image and had plenty of those orphan objects left
> in the cache pool.
> Both the ones created initially AND the ones moved back up to it from the
> base pool during the removal (all activity happens on the cache tier after
> all). 
> 
> Repeated that 2 more times and with the flush and evict timers set to 10
> and 20 minutes respectively it should have removed those, but it didn't.
> 
> Started like this:
> ---
>     NAME      ID     CATEGORY     USED       %USED     MAX AVAIL     OBJECTS     DIRTY     READ      WRITE     RAW USED 
>     rbd       0      -            23445M      0.52         3579G        5951      5951      478k     3841k       46890M 
>     cache     2      -            11587M      0.26          661G       15778      2821     77955     2522k       23174M 
> ---
> 
> and ended up like that:
> ---
>     NAME      ID     CATEGORY     USED      %USED     MAX AVAIL     OBJECTS     DIRTY     READ      WRITE     RAW USED 
>     rbd       0      -             245G      5.61         3328G       63015     63015      505k     3953k         490G 
>     cache     2      -            3498M      0.08          669G      291626      213k     80552     7995k        6996M 
> ---
> 
> Set max objects to 200k and that got rid of many (no particular death
> throes were caused by this), but still left 150k floating around.
> 
> To remove the remaining ones (and of course clean out the cache entirely)
> a "rados -p cache cache-try-flush-evict-all" did the trick.
> Which is of course impractical in a production environment.
>  
> So yeah, it's definitely a bug as these orphans will never expire it
> seems.
> And at the very least the documentation would need to reflect this.
> 

Thanks a lot - yes I desparately need to setup a test / staging cluster
as well (maybe on OpenStack with Ceph ;) ).

Did you wait for a scrub to happen? Or does scrubbing not having
anything to do with that? I thought maybe when the images are
unreferenced a scrub might actually do the freeing?

Greetings
-Sascha-

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com