On Mon, 7 Jul 2014, Mark Nelson wrote: > On 07/07/2014 02:29 PM, Sage Weil wrote: > > On Mon, 7 Jul 2014, Luis Pabon wrote: > > > Hi all, > > > I am working on OSDMonitor.cc:5325 and wanted to confirm the > > > following > > > read_forward cache tier transition: > > > > > > readforward -> forward || writeback || (any && num_objects_dirty == > > > 0) > > > forward -> writeback || readforward || (any && num_objects_dirty == > > > 0) > > > writeback -> readforward || forward > > > > > > Is this the correct cache tier state transition? > > > > That looks right to me. > > > > By the way, I had a thought after we spoke that we probably want something > > that is somewhere inbetween the current writeback behavior (promote on > > first read) and the read_forward behavior (never promote on read). I > > suspect a good all-around policy is something like promote on second read? > > This should probably be rolled into the writeback mode as a tunable... > > That would be a good start I think. What about some kind of scheme that also > favours promoting small objects over larger ones? It could be as simple as > increasing the number of reads necessary to do a promotion based on the object > size. > > ie something like: > > <= 64k object = 1 read > <= 512k object = 2 read > else 3 read > > That would make the behaviour for default RBD object sizes always 3 read, but > could keep big objects out of the cache tier for RGW. We don't have enough information to do that right now, since on a miss we redirect the client instead of proxying them and never learn what the actual object size is. If/after we start doing proxying for the reads, then lots of other stuff becomes possible... but I think we'll need to be careful about choosing where to add complexity. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html