Re: Cache tier READ_FORWARD transition

Sage Weil <sweil@xxxxxxxxxx> · Tue, 8 Jul 2014 09:01:46 -0700 (PDT)

On Mon, 7 Jul 2014, Luis Pab?n wrote:
> What about the following usecase (please forgive some of my ceph architecture
> ignorance):
> 
> If it was possible to setup OSD caching tier at the host (if the host had a
> dedicated SSD for accelerating I/O), then caching pools could be created to
> cache VM rbds, since they are inherently exclusive to a single host.  Using a
> write through (or a readonly, depending on the workload) policy would have a
> major increase in VM IOPs.   Using writethrough or readonly policy would also
> ensure any writes are first written to the back end storage tier.  Enabling
> hosts to service most of their VM I/O reads would also increases the overall
> IOPs of the back end storage tier.

This could be accomplished by doing a rados pool per client host.  The 
rados caching only works in as a writeback cache, though, not 
write-through, so you really need to replicate it for it to be usable in 
practice.  So although it's possible, this isn't a particularly attractive 
approach.

What you're describing is really a client-side write-through cache, either 
for librbd or librados.  We've discussed this in the past (mostly in the 
context of a shared host-wide read-only data, not as write-through), but 
in both cases the caching would plug into the client libraries.  There are 
some CDS notes from emperor:

	http://wiki.ceph.com/Planning/Sideboard/rbd%3A_shared_read_cache
	http://pad.ceph.com/p/rbd-shared-read-cache
	http://www.youtube.com/watch?v=SVgBdUv_Lv4&t=70m11s

Note that you can also accomplish this with the kernel rbd driver by 
layering dm-cache or bcache or something similar on top and running it in 
write-through mode.  Most clients are (KVM+)librbd, though, so eventually 
a userspace implementation for librbd (or maybe librados) makes sense.

sage

> Does this make sense?
> 
> - Luis
> 
> On 07/07/2014 03:29 PM, Sage Weil wrote:
> > On Mon, 7 Jul 2014, Luis Pabon wrote:
> > > Hi all,
> > >      I am working on OSDMonitor.cc:5325 and wanted to confirm the
> > > following
> > > read_forward cache tier transition:
> > > 
> > >      readforward -> forward || writeback || (any && num_objects_dirty ==
> > > 0)
> > >      forward -> writeback || readforward || (any && num_objects_dirty ==
> > > 0)
> > >      writeback -> readforward || forward
> > > 
> > > Is this the correct cache tier state transition?
> > That looks right to me.
> > 
> > By the way, I had a thought after we spoke that we probably want something
> > that is somewhere inbetween the current writeback behavior (promote on
> > first read) and the read_forward behavior (never promote on read).  I
> > suspect a good all-around policy is something like promote on second read?
> > This should probably be rolled into the writeback mode as a tunable...
> > 
> > sage
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html