Re: Data deduplication for Linux : lessfs

Les Mikesell <lesmikesell@gmail.com> · Wed, 24 Jun 2009 15:59:37 -0500

Roy Sigurd Karlsbakk wrote:
>>>> I am thinking about starting to work on a data deduplicating
blockdevice, a kernel module called blockless.
If done smartly, this may perhaps be possible, but the problem is the 
filesystem's metadata. Is this going to be dedup'ed? How much will 
this take? A simple backup will update atime on all the files backed 
up, and although atime isn't always wanted or needed, the problem 
occurs elsewhere.

Block level deduplication isn't going to know/care about the 
difference between file contents and metadata.  It is either stored in 
blocks that match other blocks or not and the difference should not be 
visible to the filesystem living on top of the block device.

My point exactly. If dedup was to be done on the block layer, you'd need 
flag to say "do not dedup this".

Why?  How can it possibly make any difference? It's not likely that 
you'd have dupes in the metadata block, but if you do it doesn't matter 
that they are transparently mapped into one.  You need a copy-on-write 
mechanism anyway since if you write to either they won't be dups any more.

--
  Les Mikesell
   lesmikesell@gmail.com

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/