Re: Filesystem that doesn't store duplicate data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Wednesday 05 December 2007, redhat@xxxxxxxxxxx wrote:
> You'd think that using this technology on a live filesystem could incur a
> significant performance penalty due to all those calculations (fuse module
> anyone ?). Imagine a hardware optimized data de-duplication disk
> controller, similar to raid XOR optimized cpus. Now that would be cool. All
> it would need to store was meta-data when it had already seen the exact
> same block. I think fundamentally it is similar in result to on the fly
> disk compression.

Actually, the impact - if the filesystem is designed correctly - shouldn't be 
that horrible. After all, Sun has managed to integrate checksums into ZFS and 
still get great performance. In addition, ZFS doesn't directly overwrite data 
but uses a new datablock each time...

What you would have to do then is keep a lookup table with the checksums to 
find possible matches quickly. Then when you find one, do another compare to 
be 100% sure you didn't have a collision on your checksums. If that works, 
then you can reference that datablock. 

It is still a lot of work, but as sun showed, on the fly compares and 
checksums are doable without too much of a hit.

Peter.


_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux