Re: Deduplicated archives via hardlinks [Was: XFS or EXT3 ?]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On 12/3/2010 3:14 PM, Adam Tauno Williams wrote:
>
> I know nothing about backuppc;  I don't use it.  But we use rsync with
> the same concept for a deduplicated archive.

Backuppc is a couple of perl scripts, one of which happens to 
re-implement rsync in a way that lets it use stock rsync on the remote 
while transparently accessing a compressed copy on the server side.  It 
can also use tar or samba to copy files in, then does the same 
compression/dedup operation.

>> (for deduplication) with versioning, I'd have to assume the archive
>> volume gets really messy after awhile, and further, something like that
>> is pretty darn hard to make a replica of it.
>
> I don't see why;  only the archive is deduplicated in this manner, and
> it certainly isn't "messy".  One simply makes a backup [for us that
> means to tape - a disk is not a backup] of the most current snapshot.

I does get messy because backuppc archives typically have millions of 
hardlinked files.  It doesn't just hardlink between subsequent runs of 
the same machine, it hardlinks all files with identical content from the 
same machine or other, using a pool directory of hashed filenames as a 
common link to match them up quickly.

> The script just looks like -
>
> export ROOT="/srv/cifs/Arabis-Red"
> export STAMP=`date +%Y%m%d%H`
> export LASTSTAMP=`cat $ROOT/LAST.STAMP`
> mkdir $ROOT/$STAMP
> mkdir $ROOT/$STAMP/home
>
> nice rsync --verbose --archive --delete --acls \
>        --link-dest $ROOT/$LASTSTAMP/home/ \
>        --numeric-ids \
>        -e ssh \
>          archivist@arabis-red:/home/ \
>            $ROOT/$STAMP/home/ \
>            2>&1>  $ROOT/$STAMP/home.log
>
> echo $STAMP>  $ROOT/LAST.STAMP

But that won't match up multiple copies of the same file in different 
locations or help with many machines with mostly-duplicate content. The 
backuppc scheme works pretty well in normal usage, but most 
file-oriented approaches to copy the whole backuppc archive have scaling 
problems because they have to track all the inodes and names to match up 
the hard links.

-- 
    Les Mikesell
     lesmikesell@xxxxxxxxx



_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux