Jon Nelson wrote:
Please note: I'm having trouble w/gmail's formatting... so please
forgive this if it looks horrible. :-|
On 9/28/07, Bill Davidsen <davidsen@xxxxxxx> wrote:
Dean S. Messing wrote:
It has been some time since I read the rsync man page. I see that
there is (among the bazillion and one switches) a "--link-dest=DIR"
switch which I suppose does what you describe. I'll have to
experiment with this and think things through. Thanks, Michal.
Be aware that rsync is useful for making a *copy* of your files, which
isn't always the best backup. If the goal is to preserve data and be
able to recover in time of disaster, it's probably not optimal, while if
you need frequent access to old or deleted files it's fine.
You are absolutely right when you say it isn't always the best backup. There
IS no 'best' backup.
For example, full and incremental backup methods such as dump and
restore are usually faster to take and restore than a copy, and allow
easy incremental backups.
If "copy" meant "full data copy" and not "hard link where possible", I'd
agree with you. However...
I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a
drbd-backed drive. I'll explain why I use drbd in just a moment.
Technically, I have a 3 disk raid5 (Linux Software Raid) which is the
primary store for the data. Then I have a second drive (non-raid) that is
used as a drbd backing store, which I rsync *to* from filesystems built off
of the raid. I keep *30 days* of nightly backups on the drbd volume. The
average difference between nightly backups is about 45MB, or a bit less than
10%. The total disk usage is (on average) about 10% more than a single
backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the
entire process takes between 1 and 2 minutes, from start to finish.
Using hard links means I can snapshot ~175,000 files, about 40GiB, in under
2 minutes - something I'd have a hard time doing with dump+restore. I could
easily make incremental or differential copies, and maybe even in that time
frame, but I'm not sure I much advantage in that. Furthermore, as you state,
dump+restore does *not* include the removal of files which for some
scenarios is a huge deal.
What I don't understand is how you use hard links... because a hard link
needs to be in the same filesystem, and because a hard link is just
another pointer to the inode and doesn't make a physical copy of the
data to another device or to anywhere, really.
The long and short of it is this: using hard links (via rsync or cp or
whatever) to do snapshot backups can be really, really fast and have
significant advantages but there are, as with all things, some downsides.
Those downsides are fairly easily mitigated, however. In my case, I can lose
1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part
of the raid) has the data I care about. If I lose the entire machine, the
*other* machine (the other end of the drbd, only woken up every other day or
so) has the data. Going back 30 days. And a bare-metal "restore" is as fast
as your I/O is. I back my /really/ important stuff up on DLT.
Thanks again to drbd, when the secondary comes up it communicates with the
primary and is able to figure out only which blocks have changed and only
copies those. On a nightly basis that is usually a couple of hundred
megabytes, and at 12MiB/s that doesn't take terribly long to take care of.
--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html