Re: Cache tier READ_FORWARD transition

Sage Weil <sweil@xxxxxxxxxx> · Mon, 7 Jul 2014 12:45:19 -0700 (PDT)

On Mon, 7 Jul 2014, Mark Nelson wrote:
> On 07/07/2014 02:29 PM, Sage Weil wrote:
> > On Mon, 7 Jul 2014, Luis Pabon wrote:
> > > Hi all,
> > >      I am working on OSDMonitor.cc:5325 and wanted to confirm the
> > > following
> > > read_forward cache tier transition:
> > > 
> > >      readforward -> forward || writeback || (any && num_objects_dirty ==
> > > 0)
> > >      forward -> writeback || readforward || (any && num_objects_dirty ==
> > > 0)
> > >      writeback -> readforward || forward
> > > 
> > > Is this the correct cache tier state transition?
> > 
> > That looks right to me.
> > 
> > By the way, I had a thought after we spoke that we probably want something
> > that is somewhere inbetween the current writeback behavior (promote on
> > first read) and the read_forward behavior (never promote on read).  I
> > suspect a good all-around policy is something like promote on second read?
> > This should probably be rolled into the writeback mode as a tunable...
> 
> That would be a good start I think.  What about some kind of scheme that also
> favours promoting small objects over larger ones?  It could be as simple as
> increasing the number of reads necessary to do a promotion based on the object
> size.
> 
> ie something like:
> 
> <= 64k object = 1 read
> <= 512k object = 2 read
> else 3 read
> 
> That would make the behaviour for default RBD object sizes always 3 read, but
> could keep big objects out of the cache tier for RGW.

Hmm FWIW we in the RBD vs RGW case those are different pools so we can set 
different policies.  I think small vs big object distinction might make 
sense in other contexts, though!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html