Hi Christian, great work ! Before, i didnt even know that there exist the possibility of building seperate caching pools. Thank you very much for your community contribution ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 04.03.2016 um 09:17 schrieb Christian Balzer: > > Hello, > > Unlike the subject may suggest, I'm mostly going to try and explain how > things work with cache tiers, as far as I understand them. > Something of a reference to point to. > Of course if you spot something that's wrong or have additional > information, by all means please do comment. > > While the documentation in master now correctly warns that you HAVE to set > target_max_bytes (the size of your cache pool) for any of the relative > sizing bits to work, lets repeat that here since it wasn't mentioned there > previously. > And without that value being set, none of the flushing or eviction will > happen, resulting in blocked IOs when it gets full. > > The other thing about target_max_bytes is to remember (documented nowhere) > that this space calculation is base per PG. > So if you have a 1024GB cache pool and target_max_bytes set accordingly > (one of the most annoying things about Ceph is have to specify full bytes > in most places instead of human friendly shortcuts like "1TB"), Ceph > (the cache tiering agent to be precise) will think that the cache is 50% > full when just one PG has reached 512MB. > > In short, expect things to happen quite a bit before you reach the usage > that you think you specified in cache_target_dirty_ratio and > cache_target_full_ratio. > Annoying, but at least failing safe. > > I'm ignoring target_max_objects for this, as it's the same for object > count instead of space. > min_read_recency_for_promote and min_write_recency_for_promote I shall > ignore for now as well, since I have no cluster to test them with. > > Flush > Either way once Ceph thinks you've reached the cache_target_dirty_ratio > specified, it copies dirty objects to the backing storage. > If they never existed there before, they will be created (so keep that in > mind if you see an increase in objects). > This (additional object) is similar to tier promotion, when an existing > object is copied from the base pool to the cache pool the first time it's > accessed. > > In versions after Hammer there is also cache_target_dirty_high_ratio, > which specifies at which point more aggressive flushing starts. > > Note that flushing keeps objects in the cache. > So that object you wrote too some days ago and kept reading frequently > ever since isn't just going away to the slower base pool. > > Evict > Next is eviction. This is where things became bit more muddled for me and > I had to do some testing and staring at objects in PGs. > So your cache pool is now hitting the cache_target_full_ratio (or so the > wonky space per PG algorithm thinks). > Remember that all IO will stop once the cache pool gets 100% full, so you > want this to happen at some safe, sane point before this. > What that point is depends of course on the maximum write speed to your > pool, how fast your cache can flush to the base pool, etc. > Now here is the fun part, clean objects (ones that have not been modified > since they were promoted from the base pool or last flushed) are eligible > for eviction. > When reading about this the first time I thought this involved more moving > of data from the cache pool to the base pool. > However what happens is that since the object is "clean" (copy exists on > the base pool), it is simply zero'd (after demotion), leaving an empty > rados object in the cache pool and consequently releasing space. > > So as far as IO and network traffic is concerned, your enemy is flushing, > not eviction. > > In clusters that have a clear usage pattern and idle times, a command > to trigger flushes for a specified ratio and with settable IO limits would > be most welcome. (hint-hint) > Lacking this for now, I've be pondering a cron job that sets > cache_target_dirty_ratio from .7 (my current value) to .6 (or more > likely something smaller, like .65) for a few hours during night and then > back up again. > This is based on our cache typically not growing more than 2% per day. > > Lastly we come to cache_min_flush_age and cache_min_evict_age. > It is my understanding that in Hammer and later a truly full cache pool > will cause these to be ignored to prevent IO deadlocks, correct? > > The largest source of cache pollution for us are VM reboots (all those > objects holding the kernel and other things only read at startup, never to > be needed again for months) while on the other hand we have about 10k > truly hot objects that are constantly being read/written. > Lacking min_write_recency_for_promote for now, I've been thinking to set > cache_min_evict_age to several hours. > Truly cold objects will be subject to eviction, even lukewarm ones get to > stay. > Note that for objects that more or less belong in the cache we're using > less than 15% of its capacity. > > Christian > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com