RE: Cache tiering read-proxy mode

Alex Elsayed <eternaleye@xxxxxxxxx> · Tue, 22 Jul 2014 15:50:33 -0700

Sage Weil wrote:

> [Adding ceph-devel]
> 
> On Mon, 21 Jul 2014, Wang, Zhiqiang wrote:
>> Sage,
>> 
>> I agree with you that promotion on the 2nd read could improve cache
>> tiering's performance for some kinds of workloads. The general idea here
>> is to implement some kinds of policies in the cache tier to measure the
>> warmness of the data. If the cache tier is aware of the data warmness,
>> it could even initiate data movement between the cache tier and the base
>> tier. This means data could be prefetched into the cache tier before
>> reading or writing. But I think this is something we could do in the
>> future.
> 
> Yeah. I suspect it will be challenging to put this sort of prefetching
> intelligence directly into the OSDs, though.  It could possibly be done by
> an external agent, maybe, or could be driven by explicit hints from
> clients ("I will probably access this data soon").
> 
>> The 'promotion on 2nd read' policy is straightforward. Sure it will
>> benefit some kinds of workload, but not all. If it is implemented as a
>> cache tier option, the user needs to decide to turn it on or not. But
>> I'm afraid most of the users don't have the idea of this. This increases
>> the difficulty of using cache tiering.
> 
> I suspect the 2nd read behavior will be something we'll want to do by
> default...  but yeah, there will be a new pool option (or options) that
> controls the behavior.
> 
>> One question for the implementation of 'promotion on 2nd read': what do
>> we do for the 1st read? Does the cache tier read the object from base
>> tier but not doing replication, or just redirecting it?
> 
> For the first read, we just redirect the client.  The on the second read,
> we call promote_object().  See maybe_handle_cache() in ReplicatedPG.cc.
> We can pretty easily tell the difference by checking the in-memory HitSet
> for a match.
> 
> Perhaps the option in the pool would be something like
> min_read_recency_for_promote?  If we measure "recency" as "(avg) seconds
> since last access" (loosely), 0 would mean it would promote on first read,
> and anything <= the HitSet interval would mean promote if the object is in
> the current HitSet.  > than that would mean we'd need to keep additional
> previous HitSets in RAM.
> 
> ...which leads us to a separate question of how to describe access
> frequency vs recency.  We keep N HitSets, each covering a time period of T
> seconds.  Normally we only keep the most recent HitSet in memory, unless
> the agent is active (flushing data).  So what I described above is
> checking how recently the last access was (within how many multiples of T
> seconds).  Additionally, though, we could describe the frequency of
> access: was the object accesssed at least once in every N interval of T
> seconds?  Or some fraction of them?  That is probably best described as
> "temperature?"  I'm not to fond of the term "recency," tho I can't
> think of anything better right now.
> 
> Anyway, for the read promote behavior, recency is probably sufficient, but
> for the tiering agent flush/evict behavior temperature might be a good
> thing to consider...
> 
> sage

It might be worth looking at the MQ (Multi-Queue) caching policy[1], which 
was explicitly designed for second-level caches (which applies here) - the 
client is very likely to be doing caching, whether they use CephFS 
(FSCache), RBD (client caching), or RADOS (application-level); that causes 
some interesting changes in terms of the statistical behavior of the second-
level cache.

[1] 
https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou_html/node9.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html