Re: Self Introduction / Contributing a package to Fedora

Derek Pressnall <derekp7@xxxxxxxxxxxxxxxx> · Wed, 13 Aug 2014 20:51:29 -0500

> Why? Rsnapshot and AMANDA are already available, stable, and quite robust. They seem to cover just the niches you're aiming at, and the performance and security ramifications have already been worked out. Plus, neither requires an SQL database, which makes them much more robust.
>
> It's nothing personal or criticizing your code. I just wind up cleaning up after a lot of projects that reinvent the wheel. Backup systems are a popular such target.

Good questions.  Amanda and Bacula both are good full-featured backup
tools, however they take a bit to set up and get working.  I was
previously using something similar to Rsnapshot, but wanted
compression, and better file-level deduplication (Rsnapshot will
de-duplicate files that are the same between backup sets, but not
files that are the same from multiple hosts, or different copies of
files on the same host).  And the dedup method it uses (hard links)
can make for a very large set of files to analyse on the backup media
(in my case, backing up 20 odd hosts, keeping daily/weekly/monthly
backups ended up with 600 total backups on the storage unit, and about
300 million files.  Which made for keeping an rsync'd copy of the
backup drive a bit tedious (not to mention running a "find" command to
get stats about the files, such as the largest ones or which were
unique, to improve the include/exclude list).  This is where the SQL
database comes into play -- I can easily get the top space-consuming
unique files/directories, etc.  BTW, only the file listings / metadata
are included in the DB -- the contents are stored as lzop compressed
files in the filesystem.  And since the SHA1 hash is used as the file
name, you get full file-level deduplication across multiple hosts for
free.

Other systems I looked at were Backuppc, which seemed like it was a
bit more complicated (both setup and internal workings -- couldn't
find how to execute pre/post backup scripts for live DB backups, etc),
Obnam (stores backups in an internal binary format, has issues with a
daily/weekly/monthly retention schedule), and a few others that were
still based on the rsync/hardlinks method.

BTW, here's a list of what my main design goals were, compared to other tools.

* Keep the backend data format as transparent as possible (files are
stored in lzop-compatible format, and the data catalog is in an sqlite
database). Didn't want the data to be in a big blob like some of the
non-rsync based Linux backup tools.

* Use the database for identifying duplication, instead of hard links
in the filesystem (when using an rsync backup with hard links, I
couldn't in turn rsync the entire backup drive to another volume in
any reasonable timeframe).

* Use tools that are already on the clients so there is no agent to
install (find, tar).

* Advanced features, such as preserving selinux attributes, ACLs, and
handle sparse files properly.

* Have the metadata in a database that can be queried with SQL syntax
for reporting and diagnostic purposes. I still owe documentation on
the DB schema layout though.

* Easy to get up and running (install, configure a couple lines to
point to the backup media and run), while retaining flexibility in
setting up a client/server installation.

And thank you for your feedback, if there is anything that the above
doesn't cover let me know (or if you think there needs to be more
documentation, additional features [without adding too much
complexity]). If the SQlite DB is a concern, there is also an
import/export function to dump the metadata in a tab-delimited text
file.

Summary: Snebu fills a gap between simple, and more full-featured
flexible systems, without being overly complex or opaque.
-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct