Hi Christian, > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Christian Balzer > Sent: 07 March 2016 02:22 > To: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Subject: Re: Cache tier operation clarifications > > > Hello, > > I'd like to get some insights, confirmations from people here who are either > familiar with the code or have this tested more empirically than me (the > VM/client node of my test cluster is currently pinning for the fjords). > > When it comes to flushing/evicting we already established that this triggers > based on PG utilization, not a pool wide one. > So for example in a pool with 1024TB capacity (set via target_max_bytes) and > 1024 PGs and a cache_target_dirty_ratio of 0.5 flushing will start when the > first PG reaches 512MB utilization. > > However while the documentation states that the least recently objects are > evicted when things hit the cache_target_full_ratio, it is less than clear > (understatement of the year) when flushing is concerned. > To quote: > "When the cache pool consists of a certain percentage of modified (or > dirty) objects, the cache tiering agent will flush them to the storage pool." > > How do we read this? > When hitting 50% (as in the example above) all of the dirty objects will get > flushed? > That doesn't match what I'm seeing nor would it be a sensible course of > action to unleash such a potentially huge torrent of writes. > > If we interpret this as "get the dirty objects below the threshold" (which is > what seems to happen) there are 2 possible courses of action here: > > 1. Flush dirty object(s) from the PG that has reached the threshold. > A sensible course of action in terms of reducing I/Os, but it may keep flushing > the same objects over and over again if they happen to be on the "full" PG. I think this is how it works. The agents/hitsets work at the per PG level and the flushing code is very closely linked. I can't be 100% sure, but I'm 90%+ sure. https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.cc#L11967 It uses that cache_min_flush_age variable to check if the object is old enough to be flushed, but I can't see any logic as to how it selects objects in the first place. It almost looks like it just cycles through all the objects in order, it would be nice to have this confirmed. In releases after Hammer there are two thresholds that flush at different speeds. This can help as 1. It means that at the low threshold it uses less IO to flush 2. Between low and high thresholds the cache effectively cleans itself down to the low threshold during idle periods. So it's ready to absorb bursts of writes when your workloads get busy. You need to play around with the max_agent_ops variable for both which controls how many concurrent flushes can occur, so that during normal behaviour the % dirty is somewhere between the low and high thresholds. Although at the moment none of this is accessible to you, until you upgrade to Jewel in the future. > > 2. Flush dirty objects from all PGs (most likely in a least recently used > fashion) and stop when we're eventually under the threshold by having > finally hit the "full" PG. > Results in a lot more IO but will of course create more clean objects available > for eviction if needed. > This is what I think is happening. > > So, is there any "least recently used" consideration in effect here, or is the > only way to avoid (pointless) flushes by setting "cache_min_flush_age" > accordingly? > > Unlike for flushes above, eviction clearly states that it's going by "least > recently used". > Which in the case of per PG operation would violate that promise, as people > of course expect this to be pool wide. > And if it is indeed pool wide, the same effect as above will happen, evictions > will happen until the "full" PG gets hit, evicting far more than would have > been needed. > > > Something to maybe consider would be a target value, for example with > "cache_target_full_ratio" at 0.80 and "cache_target_full_ratio_target" at > 0.78, evicting things until it reaches the target ratio. How is that any different from target_max_bytes (which is effectively 1.0) and cache_target_full_ratio = 0.8? > > Lastly, while we have perf counters like "tier_dirty", a gauge for dirty and > clean objects/bytes would be quite useful to me at least. I agree it would be nice to have these as counters, I had to write a diamond collector to scrape these figures out of "ceph df detail" > And clearly the cache tier agent already has those numbers. > Right now I'm guestimating that most of my cache objects are actually clean > (from VM reboots, only read, never written to), but I have no way to tell for > sure. > > Regards, > > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com