Roberto Ragusa <mail@xxxxxxxxxxxxxxxx> writes: > One day I decided to remove some old backups by launching > an rm command for each snapshot directory in parallel. > I then realized that there were more than 1000 directories, > and the total number of files to be deleted was around > 100 million. > It took some time, but everything went fine; not a bad > stress test for the machine (reiserfs/LVM2/nv_sata) > I had never seen a load average above 1000 until then. > :-) > > There is only one thing I'd like to improve: renamed > or moved files are seen as new files and are not hardlinked. > I didn't try if "--fuzzy" works for hardlinking too. I have put my "just another backup solution" on http://www.kernel.org/~chris/cbackup-1.05.tar.gz Probably someone would find it useful. I'm using it for about a year without problems. This is basically disk backup, using SHA1 to avoid storing duplicates. I'm using a single NFS-exported disk for backing up a set of machines. Output files are: a single big compressed archive + a single index file per backup session. A single "hash list" is being kept and updated for the whole set of backups. Identical files (on different machines and/or in different places) take the space only once - the first time SHA1 check is performed and then mtime/ctime/inode is checked (or mtime/ctime/name if inode numbers aren't stable). There is no concept of incremental backup here: every index file contains a complete list of files (but it references actual data contained in current and previous *.arc archives). The index file is a text, every line contains type, size, inode, hash, mode, etc. and name (terminated by 0 rather than \n). That enables use of normal text utils (comparing different backups etc). It uses a concept of SHA1 being an index, similar to git. What I usually do is: # backup -v -v -i hash_list -i last_backup_for_this_machine.idx -a *.arc \ -oi new_backup_for_this_machine.idx -o new_backup_etc.arc -oh hash_list.new While currently all archive files (*.arc) should be accessible during backup and restore, it could be trivially modified to remove this restriction (i.e., archives could span multiple media, and be processed sequentially while restoring). There is one noticeable restriction: the complete hash list and file data from previous backup (+ file names if inode numbers are ignored) need to be kept in memory for the duration of backup session. That means it will use several MB of memory for backing up, say, a single million of files. Possible parameters: Usage: backup [options] [--] file... Back-up backup --restore [options] Restore backup --stats [options] Statistics options: -i index... Read index from file 'index' -a archive... Read archive from file 'archive' -t Test file SHA-1 hashes -v... Verbose backup options: -1 One filesystem -x exclude... Exclude files and directories -oi index Output index to file 'index' -oa archive Output archive to file 'archive' -oh hashes Output hash list to file 'index' --ignore-inodes Ignore inode numbers (for FAT) restore options: -s... Strip one directory component -n... do not restore -- Krzysztof Halasa -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list