Re: C8 and backup solution

Alessandro Baggi <alessandro.baggi@xxxxxxxxx> · Fri, 3 Apr 2020 10:44:03 +0200

Il 02/04/20 21:14, Karl Vogel ha scritto:
[Replying privately because my messages aren't making it to the list]

In a previous message, Alessandro Baggi said:
A> Bacula works without any problem, well tested, solid but complex to
A> configure. Tested on a single server (with volumes on disk) and a
A> full backup of 810gb (~150000 files) took 6,30 hours (too much).

For a full backup, I'd use something like "scp -rp". Anything else
has overhead you don't need for the first copy.

Also, pick a good cipher (-c) for the ssh/scp commands -- it can improve
your speed by an order of magnitude. Here's an example where I copy
my current directory to /tmp/bkup on my backup server:

Running on: Linux x86_64
Thu Apr 2 14:48:45 2020

me% scp -rp -c aes128-gcm@xxxxxxxxxxx -i $HOME/.ssh/bkuphost_ecdsa \
. bkuphost:/tmp/bkup

Authenticated to remote-host ([remote-ip]:22).
ansible-intro 100% 16KB 11.3MB/s 00:00 ETA
nextgov.xml 100% 27KB 21.9MB/s 00:00 ETA
building-VM-images 100% 1087 1.6MB/s 00:00 ETA
sort-array-of-hashes 100% 1660 2.5MB/s 00:00 ETA
...
ex1 100% 910 1.9MB/s 00:00 ETA
sitemap.m4 100% 1241 2.3MB/s 00:00 ETA
contents 100% 3585 5.5MB/s 00:00 ETA
ini2site 100% 489 926.1KB/s 00:00 ETA
mkcontents 100% 1485 2.2MB/s 00:00 ETA

Transferred: sent 6465548, received 11724 bytes, in 0.4 seconds
Bytes per second: sent 18002613.2, received 32644.2

Thu Apr 02 14:48:54 2020

A> scripted rsync. Simple, through ssh protocol and private key. No agent
A> required on target. I use file level deduplication using hardlinks.

I avoid block-level deduplication as a general rule -- ZFS memory
use goes through the roof if you turn that on.

rsync can do the hardlinks, but for me it's been much faster to create
a list of SHA1 hashes and use a perl script to link the duplicates.
I can send you the script if you're interested.

This way, you're not relying on the network for anything other than the
copies; everything else takes place on the local or backup system.

A> Using a scripted rsync is the simpler way but there is something that
A> could be leaved out by me (or undiscovered error). Simple to restore.

I've never had a problem with rsync, and I've used it to back up Linux
workstations with ~600Gb or so. One caveat -- if you give it a really
big directory tree, it can get lost in the weeds. You might want to do
something like this:

1. Make your original backup using scp.

2. Get a complete list of file hashes on your production systems
using SHA1 or whatever you like.

3. Whenever you do a backup, get a (smaller) list of modified files
using something like "find ./something -newer /some/timestamp/file"
or just making a new list of file hashes and comparing that to the
original list.

4. Pass the list of modified files to rsync using the "--files-from"
option so it doesn't have to walk the entire tree again.

Good luck!

--
Karl Vogel / vogelke@xxxxxxxxx / I don't speak for the USAF or my company

The best setup is having a wife and a mistress. Each of them will assume
you're with the other, leaving you free to get some work done.
--programmer with serious work-life balance issues

Hi Karl,

thank you for your answer. I'm trying ssh scripted rsync using a faster 
cypher like you suggested and seems that transfer on 10GB is better of 
default selected cypher (129 sec vs 116 using aes128-gcm, I tested this 
multiple times). Now I will try to check on the entire dataset and see 
how much benefit I gain.

Waiting that, what do you think about bacula as backup solution?

Thank you in advance.

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos