Peter Arremann wrote:
How about a FUSE file system (userland, ie NTFS 3G) that layers
on top of any file system that supports hard links
That would be easy but I can see a few issues with that approach:
1) On file level rather than block level you're going to be much more
inefficient. I for one have gigabytes of revisions of files that have changed
a little between each file.
That is a problem for the way backuppc stores things - but at least it
can compress the files.
2) You have to write all datablocks to disk and then erase them again if you
find a match. That will slow you down and create some weird behavior. I.e.
you know the FS shouldn't store duplicate data, yet you can't use cp to copy
a 10G file if only 9G are free. If you copy a 8G file, you see the usage
increase till only 1G is free, then when your app closes the file, you are
going to go back to 9G free...
Only using it for backup storage is a special case where this is not so
bad. Backuppc also has a way to rsync against the stored copy so
matching files (or parts) may not need to be transfered at all.
3) Rather than continuously looking for matches on block level, you have to
search for matches on files that can be any size. That is fine if you have a
100K file - but if you have a 100M or larger file, the checksum calculations
will take you forever.
The backuppc scheme is to use a hash of some amount of the uncompressed
file as a pooled filename for the link to quickly weed out most
possibilities and permit the compression level to be changed. The full
check then only has to be done on collisions.
This means rather than adding a specific, small
penalty to every write call, you add a unknown penalty, proportional to file
size when closing the file. Also, the fact that most C coders don't check the
return code of close doesn't make me happy there...
In backuppc, the writer understands the scheme - and the linking is
somewhat decoupled from the tranfers. But, even in a normal filesystem
writes are buffered and if you don't fsync there is a lot that can go
wrong after a close() reports success.
--
Les Mikesell
lesmikesell@xxxxxxxxx
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos