Re: Cache tier READ_FORWARD transition

Luis Pabon <lpabon@xxxxxxxxxx> · Wed, 09 Jul 2014 13:46:31 -0400

This is great information.

Thank Sage.

- Luis

On 07/08/2014 12:01 PM, Sage Weil wrote:
On Mon, 7 Jul 2014, Luis Pab?n wrote:
What about the following usecase (please forgive some of my ceph architecture
ignorance):

If it was possible to setup OSD caching tier at the host (if the host had a
dedicated SSD for accelerating I/O), then caching pools could be created to
cache VM rbds, since they are inherently exclusive to a single host.  Using a
write through (or a readonly, depending on the workload) policy would have a
major increase in VM IOPs.   Using writethrough or readonly policy would also
ensure any writes are first written to the back end storage tier.  Enabling
hosts to service most of their VM I/O reads would also increases the overall
IOPs of the back end storage tier.
This could be accomplished by doing a rados pool per client host.  The
rados caching only works in as a writeback cache, though, not
write-through, so you really need to replicate it for it to be usable in
practice.  So although it's possible, this isn't a particularly attractive
approach.

What you're describing is really a client-side write-through cache, either
for librbd or librados.  We've discussed this in the past (mostly in the
context of a shared host-wide read-only data, not as write-through), but
in both cases the caching would plug into the client libraries.  There are
some CDS notes from emperor:

	http://wiki.ceph.com/Planning/Sideboard/rbd%3A_shared_read_cache
	http://pad.ceph.com/p/rbd-shared-read-cache
	http://www.youtube.com/watch?v=SVgBdUv_Lv4&t=70m11s

Note that you can also accomplish this with the kernel rbd driver by
layering dm-cache or bcache or something similar on top and running it in
write-through mode.  Most clients are (KVM+)librbd, though, so eventually
a userspace implementation for librbd (or maybe librados) makes sense.

sage

Does this make sense?

- Luis

On 07/07/2014 03:29 PM, Sage Weil wrote:
On Mon, 7 Jul 2014, Luis Pabon wrote:
Hi all,
      I am working on OSDMonitor.cc:5325 and wanted to confirm the
following
read_forward cache tier transition:

      readforward -> forward || writeback || (any && num_objects_dirty ==
0)
      forward -> writeback || readforward || (any && num_objects_dirty ==
0)
      writeback -> readforward || forward

Is this the correct cache tier state transition?
That looks right to me.

By the way, I had a thought after we spoke that we probably want something
that is somewhere inbetween the current writeback behavior (promote on
first read) and the read_forward behavior (never promote on read).  I
suspect a good all-around policy is something like promote on second read?
This should probably be rolled into the writeback mode as a tunable...

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html