Sage Weil wrote: > [Adding ceph-devel] > > On Mon, 21 Jul 2014, Wang, Zhiqiang wrote: >> Sage, >> >> I agree with you that promotion on the 2nd read could improve cache >> tiering's performance for some kinds of workloads. The general idea here >> is to implement some kinds of policies in the cache tier to measure the >> warmness of the data. If the cache tier is aware of the data warmness, >> it could even initiate data movement between the cache tier and the base >> tier. This means data could be prefetched into the cache tier before >> reading or writing. But I think this is something we could do in the >> future. > > Yeah. I suspect it will be challenging to put this sort of prefetching > intelligence directly into the OSDs, though. It could possibly be done by > an external agent, maybe, or could be driven by explicit hints from > clients ("I will probably access this data soon"). > >> The 'promotion on 2nd read' policy is straightforward. Sure it will >> benefit some kinds of workload, but not all. If it is implemented as a >> cache tier option, the user needs to decide to turn it on or not. But >> I'm afraid most of the users don't have the idea of this. This increases >> the difficulty of using cache tiering. > > I suspect the 2nd read behavior will be something we'll want to do by > default... but yeah, there will be a new pool option (or options) that > controls the behavior. > >> One question for the implementation of 'promotion on 2nd read': what do >> we do for the 1st read? Does the cache tier read the object from base >> tier but not doing replication, or just redirecting it? > > For the first read, we just redirect the client. The on the second read, > we call promote_object(). See maybe_handle_cache() in ReplicatedPG.cc. > We can pretty easily tell the difference by checking the in-memory HitSet > for a match. > > Perhaps the option in the pool would be something like > min_read_recency_for_promote? If we measure "recency" as "(avg) seconds > since last access" (loosely), 0 would mean it would promote on first read, > and anything <= the HitSet interval would mean promote if the object is in > the current HitSet. > than that would mean we'd need to keep additional > previous HitSets in RAM. > > ...which leads us to a separate question of how to describe access > frequency vs recency. We keep N HitSets, each covering a time period of T > seconds. Normally we only keep the most recent HitSet in memory, unless > the agent is active (flushing data). So what I described above is > checking how recently the last access was (within how many multiples of T > seconds). Additionally, though, we could describe the frequency of > access: was the object accesssed at least once in every N interval of T > seconds? Or some fraction of them? That is probably best described as > "temperature?" I'm not to fond of the term "recency," tho I can't > think of anything better right now. > > Anyway, for the read promote behavior, recency is probably sufficient, but > for the tiering agent flush/evict behavior temperature might be a good > thing to consider... > > sage It might be worth looking at the MQ (Multi-Queue) caching policy[1], which was explicitly designed for second-level caches (which applies here) - the client is very likely to be doing caching, whether they use CephFS (FSCache), RBD (client caching), or RADOS (application-level); that causes some interesting changes in terms of the statistical behavior of the second- level cache. [1] https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou_html/node9.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html