Re: Cache tier operation clarifications

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Sat, 5 Mar 2016 10:17:08 +0100

Hi Christian,

great work !

Before, i didnt even know that there exist the possibility of building
seperate caching pools.

Thank you very much for your community contribution !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 04.03.2016 um 09:17 schrieb Christian Balzer:
> 
> Hello,
> 
> Unlike the subject may suggest, I'm mostly going to try and explain how
> things work with cache tiers, as far as I understand them.
> Something of a reference to point to.
> Of course if you spot something that's wrong or have additional
> information, by all means please do comment.
> 
> While the documentation in master now correctly warns that you HAVE to set
> target_max_bytes (the size of your cache pool) for any of the relative
> sizing bits to work, lets repeat that here since it wasn't mentioned there
> previously. 
> And without that value being set, none of the flushing or eviction will
> happen, resulting in blocked IOs when it gets full.
> 
> The other thing about target_max_bytes is to remember (documented nowhere)
> that this space calculation is base per PG. 
> So if you have a 1024GB cache pool and target_max_bytes set accordingly
> (one of the most annoying things about Ceph is have to specify full bytes
> in most places instead of human friendly shortcuts like "1TB"), Ceph
> (the cache tiering agent to be precise) will think that the cache is 50%
> full when just one PG has reached 512MB.
> 
> In short, expect things to happen quite a bit before you reach the usage
> that you think you specified in cache_target_dirty_ratio and
> cache_target_full_ratio.
> Annoying, but at least failing safe.
> 
> I'm ignoring target_max_objects for this, as it's the same for object
> count instead of space.
> min_read_recency_for_promote and min_write_recency_for_promote I shall
> ignore for now as well, since I have no cluster to test them with.
> 
> Flush
> Either way once Ceph thinks you've reached the cache_target_dirty_ratio
> specified, it copies dirty objects to the backing storage. 
> If they never existed there before, they will be created (so keep that in
> mind if you see an increase in objects).
> This (additional object) is similar to tier promotion, when an existing
> object is copied from the base pool to the cache pool the first time it's
> accessed.
> 
> In versions after Hammer there is also cache_target_dirty_high_ratio,
> which specifies at which point more aggressive flushing starts.
> 
> Note that flushing keeps objects in the cache.
> So that object you wrote too some days ago and kept reading frequently
> ever since isn't just going away to the slower base pool.
> 
> Evict
> Next is eviction. This is where things became bit more muddled for me and
> I had to do some testing and staring at objects in PGs.
> So your cache pool is now hitting the cache_target_full_ratio (or so the
> wonky space per PG algorithm thinks).
> Remember that all IO will stop once the cache pool gets 100% full, so you
> want this to happen at some safe, sane point before this. 
> What that point is depends of course on the maximum write speed to your
> pool, how fast your cache can flush to the base pool, etc.
> Now here is the fun part, clean objects (ones that have not been modified
> since they were promoted from the base pool or last flushed) are eligible
> for eviction. 
> When reading about this the first time I thought this involved more moving
> of data from the cache pool to the base pool.
> However what happens is that since the object is "clean" (copy exists on
> the base pool), it is simply zero'd (after demotion), leaving an empty
> rados object in the cache pool and consequently releasing space.
> 
> So as far as IO and network traffic is concerned, your enemy is flushing,
> not eviction.
> 
> In clusters that have a clear usage pattern and idle times, a command
> to trigger flushes for a specified ratio and with settable IO limits would
> be most welcome. (hint-hint)
> Lacking this for now, I've be pondering a cron job that sets
> cache_target_dirty_ratio from .7 (my current value) to .6 (or more
> likely something smaller, like .65) for a few hours during night and then
> back up again. 
> This is based on our cache typically not growing more than 2% per day.
> 
> Lastly we come to cache_min_flush_age and cache_min_evict_age.
> It is my understanding that in Hammer and later a truly full cache pool
> will cause these to be ignored to prevent IO deadlocks, correct?
> 
> The largest source of cache pollution for us are VM reboots (all those
> objects holding the kernel and other things only read at startup, never to
> be needed again for months) while on the other hand we have about 10k
> truly hot objects that are constantly being read/written. 
> Lacking min_write_recency_for_promote for now, I've been thinking to set
> cache_min_evict_age to several hours. 
> Truly cold objects will be subject to eviction, even lukewarm ones get to
> stay. 
> Note that for objects that more or less belong in the cache we're using
> less than 15% of its capacity.
> 
> Christian
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com