Hello, Josef has added Offline Deduplication for Btrfs. http://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg07777.html On Wed, Feb 2, 2011 at 12:54 PM, Gregory Farnum <gregf@xxxxxxxxxxxxxxx> wrote: > > On Tue, Feb 1, 2011 at 11:13 PM, Colin McCabe <cmccabe@xxxxxxxxxxxxxx> wrote: > > On Mon, Jan 31, 2011 at 10:08 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> One idea we've talked a fair bit about is layering RBD images. The idea > >> would be to create a new image in O(1) time that mirrors on old image and > >> get copy-on-write type semantics, like a writeable snapshot. > >> > >> We've come up with a few different approaches for doing this, each with > >> somewhat different performance characteristics. The main consideration is > >> that RBD images do not (currently) have an "allocation table." Image data > >> is simply striped over objects (that may or may not exist). You read the > >> object for a given block to see if it exists; if it doesn't (a "hole"), > >> the content is defined to be zero-filled. > > > > Have we thought about the hash table based approach yet? Where every > > block gets hashed and we only store one copy for each? I guess this is > > basically how git works, except instead of fixed-size blocks, it > > tracks variable-sized blobs. This is also how ZFS dedupe works. > > > > The nice thing about the hash table based approach is that you don't > > have to track parent-child relationships explicitly. If two users > > happen to both install Centos 5.5 with the same settings on the same > > sized-image, they'll both be deduped automatically. > How would you place the blocks in a CAS-based block device like this? > An allocation table might feel ugly, but when you're doing > cluster-wide block sharing you're going to need the extra metadata > somewhere. Better to store an allocation table than try and maintain > the coherency required for dynamic de-dup like that. > > I guess I should say that de-dup would be a nice feature to support, > but I don't think it's appropriate to implement as part of RBD. > Anything that powerful needs to be a core RADOS feature. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Kind Regards, --------------------- Kiran T Patil -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html