Please note: I'm having trouble w/gmail's formatting... so please forgive this if it looks horrible. :-| On 9/28/07, Bill Davidsen <davidsen@xxxxxxx> wrote: > > Dean S. Messing wrote: > > It has been some time since I read the rsync man page. I see that > > there is (among the bazillion and one switches) a "--link-dest=DIR" > > switch which I suppose does what you describe. I'll have to > > experiment with this and think things through. Thanks, Michal. > > > > Be aware that rsync is useful for making a *copy* of your files, which > isn't always the best backup. If the goal is to preserve data and be > able to recover in time of disaster, it's probably not optimal, while if > you need frequent access to old or deleted files it's fine. You are absolutely right when you say it isn't always the best backup. There IS no 'best' backup. For example, full and incremental backup methods such as dump and > restore are usually faster to take and restore than a copy, and allow > easy incremental backups. If "copy" meant "full data copy" and not "hard link where possible", I'd agree with you. However... I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a drbd-backed drive. I'll explain why I use drbd in just a moment. Technically, I have a 3 disk raid5 (Linux Software Raid) which is the primary store for the data. Then I have a second drive (non-raid) that is used as a drbd backing store, which I rsync *to* from filesystems built off of the raid. I keep *30 days* of nightly backups on the drbd volume. The average difference between nightly backups is about 45MB, or a bit less than 10%. The total disk usage is (on average) about 10% more than a single backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the entire process takes between 1 and 2 minutes, from start to finish. Using hard links means I can snapshot ~175,000 files, about 40GiB, in under 2 minutes - something I'd have a hard time doing with dump+restore. I could easily make incremental or differential copies, and maybe even in that time frame, but I'm not sure I much advantage in that. Furthermore, as you state, dump+restore does *not* include the removal of files which for some scenarios is a huge deal. The long and short of it is this: using hard links (via rsync or cp or whatever) to do snapshot backups can be really, really fast and have significant advantages but there are, as with all things, some downsides. Those downsides are fairly easily mitigated, however. In my case, I can lose 1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part of the raid) has the data I care about. If I lose the entire machine, the *other* machine (the other end of the drbd, only woken up every other day or so) has the data. Going back 30 days. And a bare-metal "restore" is as fast as your I/O is. I back my /really/ important stuff up on DLT. Thanks again to drbd, when the secondary comes up it communicates with the primary and is able to figure out only which blocks have changed and only copies those. On a nightly basis that is usually a couple of hundred megabytes, and at 12MiB/s that doesn't take terribly long to take care of. -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html