>>Note that you can also accomplish this with the kernel rbd driver by >>layering dm-cache or bcache or something similar on top and running it in >>write-through mode. Most clients are (KVM+)librbd, though, so eventually >>a userspace implementation for librbd (or maybe librados) makes sense. I vote for this, it would be wonderful to have a client cache at librbd level ! ----- Mail original ----- De: "Sage Weil" <sweil@xxxxxxxxxx> À: "Luis Pabón" <lpabon@xxxxxxxxxx> Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx Envoyé: Mardi 8 Juillet 2014 18:01:46 Objet: Re: Cache tier READ_FORWARD transition On Mon, 7 Jul 2014, Luis Pab?n wrote: > What about the following usecase (please forgive some of my ceph architecture > ignorance): > > If it was possible to setup OSD caching tier at the host (if the host had a > dedicated SSD for accelerating I/O), then caching pools could be created to > cache VM rbds, since they are inherently exclusive to a single host. Using a > write through (or a readonly, depending on the workload) policy would have a > major increase in VM IOPs. Using writethrough or readonly policy would also > ensure any writes are first written to the back end storage tier. Enabling > hosts to service most of their VM I/O reads would also increases the overall > IOPs of the back end storage tier. This could be accomplished by doing a rados pool per client host. The rados caching only works in as a writeback cache, though, not write-through, so you really need to replicate it for it to be usable in practice. So although it's possible, this isn't a particularly attractive approach. What you're describing is really a client-side write-through cache, either for librbd or librados. We've discussed this in the past (mostly in the context of a shared host-wide read-only data, not as write-through), but in both cases the caching would plug into the client libraries. There are some CDS notes from emperor: http://wiki.ceph.com/Planning/Sideboard/rbd%3A_shared_read_cache http://pad.ceph.com/p/rbd-shared-read-cache http://www.youtube.com/watch?v=SVgBdUv_Lv4&t=70m11s Note that you can also accomplish this with the kernel rbd driver by layering dm-cache or bcache or something similar on top and running it in write-through mode. Most clients are (KVM+)librbd, though, so eventually a userspace implementation for librbd (or maybe librados) makes sense. sage > Does this make sense? > > - Luis > > On 07/07/2014 03:29 PM, Sage Weil wrote: > > On Mon, 7 Jul 2014, Luis Pabon wrote: > > > Hi all, > > > I am working on OSDMonitor.cc:5325 and wanted to confirm the > > > following > > > read_forward cache tier transition: > > > > > > readforward -> forward || writeback || (any && num_objects_dirty == > > > 0) > > > forward -> writeback || readforward || (any && num_objects_dirty == > > > 0) > > > writeback -> readforward || forward > > > > > > Is this the correct cache tier state transition? > > That looks right to me. > > > > By the way, I had a thought after we spoke that we probably want something > > that is somewhere inbetween the current writeback behavior (promote on > > first read) and the read_forward behavior (never promote on read). I > > suspect a good all-around policy is something like promote on second read? > > This should probably be rolled into the writeback mode as a tunable... > > > > sage > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html