Re: rbd layering

Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx> · Tue, 1 Feb 2011 23:34:45 -0800

On Tue, Feb 1, 2011 at 11:13 PM, Colin McCabe <cmccabe@xxxxxxxxxxxxxx> wrote:
>
> On Mon, Jan 31, 2011 at 10:08 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > One idea we've talked a fair bit about is layering RBD images.  The idea
> > would be to create a new image in O(1) time that mirrors on old image and
> > get copy-on-write type semantics, like a writeable snapshot.
> >
> > We've come up with a few different approaches for doing this, each with
> > somewhat different performance characteristics.  The main consideration is
> > that RBD images do not (currently) have an "allocation table."  Image data
> > is simply striped over objects (that may or may not exist).  You read the
> > object for a given block to see if it exists; if it doesn't (a "hole"),
> > the content is defined to be zero-filled.
>
> Have we thought about the hash table based approach yet? Where every
> block gets hashed and we only store one copy for each? I guess this is
> basically how git works, except instead of fixed-size blocks, it
> tracks variable-sized blobs. This is also how ZFS dedupe works.
>

Long ago there were some plans to introduce content addressable
storage at the osd level. We will probably want to have something like
that sometime, but we'd rather introduce it as a proper osd/rados
feature and not as some hack tailored specifically for rbd. I don't
want to start digging into the architectural requirements, but my gut
feeling says that it's not going to be trivial (as an understatement)
and its benefits compared to what we'd lose (simplicity, performance)
are marginal.

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html