The biggest difficulty in answering your question is that you asked
about a specific method to solve a general problem, without specifying
any other requirements. In particular, in order to give you good
answers we need to know whether you need only single near-line backups,
or whether you need multiple snapshots, or whether you need those and
also off-site backups on tape or on replicated systems.
The short answer is that replication can get you redundancy (which is
not a backup), rsnapshot can often do a good job of getting very
space-efficient online snapshots, and that Bacula Enterprise is an
excellent option for backing up data to removable data like tape. You
can probably get an ideal backup solution using some combination of
those systems. Finding that combination is the complicated part.
Each of those things solve some portion of the problem that you're
asking about. First, you asked about block level change tracking.
That's exactly what replication requires, and you'll find that a
replicated filesystem provides exactly that: they track block changes
and can efficiently transfer those blocks to a remote system. Alan
mentioned ceph, and that's probably a great solution. Your production
systems, local and remote, should have their data on a replicated volume
where at least one of the replicas is your backup server. One backup
server can serve as the replica of all of the volumes with data in all
of your production systems. Once that's in place, until the backup
storage array fails, you'll never have to transfer a full backup again.
Whether you back up to rsnapshot or removable media, you'll back up
from the local filesystem and you've eliminated the network as a
bottleneck for backups.
If you only need online backups, you may be able to get that with
rsnapshot. rsnapshot is fairly good when you're not dealing with very
large files (such as databases). If you combine that with ceph, your
backup system will need one volume to replicate your production data and
a second volume to back it up. At this point, you'll have eliminated
the network bottleneck at a cost of more disk storage (which is fairly
cheap, compared to the cost of increasing the speed of the network).
If you need offline backups such as tape, bacula can also back up from
that locally replicated volume. Bacula Enterprise can provide the other
bits you mentioned wanting: a web dashboard and easier
configuration/management.
A few more notes follow:
On 12/14/2012 04:42 AM, Fernando Lozano wrote:
We already have a few TB on file shares (Samba) and mailboxes (Zimba)
and just moving those bits around for our weekly full backup is proving
to be too slow for our Gb network and impossible for the hosted machines
we use as contingency and off-site backup . Beisdes, incremental backups
are taking a too long time just scanning the file systems searching for
changed files.
If scanning your filesystem takes too long, your storage array is
probably too slow. Consider using RAID10 instead of RAID5/6. Consider
using SSDs instead of hard drives. Consider using a fast additional
drive or array as your ext4 journal.
Sory for the long story, the question: could I implement block-level
backups using dump, dd, and some LVM or ext utility? Maybe using
inotify? Why no open source backup tool seems to be doing this?
Mostly because inotify only allows you to track which files are changed,
and only for files that are changed while the tracking daemon is
running. OS X does something very much like this for Time Machine: a
small daemon logs the files from a kernel notification. The kernel
keeps a small notification queue (which Linux does not, as far as I
know), so that if the daemon stops and for files that are modified
during the boot sequence before that tracking daemon starts, the
tracking daemon can still keep a log which Time Machine will back up.
If one of those components detects that the tracking daemon may have
missed kernel notices, the system falls back to a full scan.
It's not a very complicated system, and could be duplicated fairly
simply under Linux, but you'd fall back to full scans much more often
since (again, as far as I know) there's no kernel notification queue, so
a full scan would be required every time the tracking daemon starts. It
doesn't have to wait on the start of a backup, however. The tracking
daemon could do the crawl as soon as it starts.
Would any option allow me to restore an individual file?
Virtually every option does. The only case in which you can't restore
an individual file is when you replicate a volume to a system that
doesn't understand the filesystem/volume contents.
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org