Re: Cache tier READ_FORWARD transition

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Thu, 10 Jul 2014 06:34:08 +0200 (CEST)

>>Note that you can also accomplish this with the kernel rbd driver by 
>>layering dm-cache or bcache or something similar on top and running it in 
>>write-through mode.  Most clients are (KVM+)librbd, though, so eventually 
>>a userspace implementation for librbd (or maybe librados) makes sense.

I vote for this,
it would be wonderful to have a client cache at librbd level !

----- Mail original ----- 

De: "Sage Weil" <sweil@xxxxxxxxxx> 
À: "Luis Pabón" <lpabon@xxxxxxxxxx> 
Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx 
Envoyé: Mardi 8 Juillet 2014 18:01:46 
Objet: Re: Cache tier READ_FORWARD transition 

On Mon, 7 Jul 2014, Luis Pab?n wrote: 
> What about the following usecase (please forgive some of my ceph architecture 
> ignorance): 
> 
> If it was possible to setup OSD caching tier at the host (if the host had a 
> dedicated SSD for accelerating I/O), then caching pools could be created to 
> cache VM rbds, since they are inherently exclusive to a single host. Using a 
> write through (or a readonly, depending on the workload) policy would have a 
> major increase in VM IOPs. Using writethrough or readonly policy would also 
> ensure any writes are first written to the back end storage tier. Enabling 
> hosts to service most of their VM I/O reads would also increases the overall 
> IOPs of the back end storage tier. 

This could be accomplished by doing a rados pool per client host. The 
rados caching only works in as a writeback cache, though, not 
write-through, so you really need to replicate it for it to be usable in 
practice. So although it's possible, this isn't a particularly attractive 
approach. 

What you're describing is really a client-side write-through cache, either 
for librbd or librados. We've discussed this in the past (mostly in the 
context of a shared host-wide read-only data, not as write-through), but 
in both cases the caching would plug into the client libraries. There are 
some CDS notes from emperor: 

http://wiki.ceph.com/Planning/Sideboard/rbd%3A_shared_read_cache 
http://pad.ceph.com/p/rbd-shared-read-cache 
http://www.youtube.com/watch?v=SVgBdUv_Lv4&t=70m11s 

Note that you can also accomplish this with the kernel rbd driver by 
layering dm-cache or bcache or something similar on top and running it in 
write-through mode. Most clients are (KVM+)librbd, though, so eventually 
a userspace implementation for librbd (or maybe librados) makes sense. 

sage 

> Does this make sense? 
> 
> - Luis 
> 
> On 07/07/2014 03:29 PM, Sage Weil wrote: 
> > On Mon, 7 Jul 2014, Luis Pabon wrote: 
> > > Hi all, 
> > > I am working on OSDMonitor.cc:5325 and wanted to confirm the 
> > > following 
> > > read_forward cache tier transition: 
> > > 
> > > readforward -> forward || writeback || (any && num_objects_dirty == 
> > > 0) 
> > > forward -> writeback || readforward || (any && num_objects_dirty == 
> > > 0) 
> > > writeback -> readforward || forward 
> > > 
> > > Is this the correct cache tier state transition? 
> > That looks right to me. 
> > 
> > By the way, I had a thought after we spoke that we probably want something 
> > that is somewhere inbetween the current writeback behavior (promote on 
> > first read) and the read_forward behavior (never promote on read). I 
> > suspect a good all-around policy is something like promote on second read? 
> > This should probably be rolled into the writeback mode as a tunable... 
> > 
> > sage 
> > 
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo@xxxxxxxxxxxxxxx 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@xxxxxxxxxxxxxxx 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html