Re: Questions about rdiff-backup

Manu Hernandez <manu@xxxxxxxxxxxxxxxxxxx> · Tue, 8 Sep 2020 01:45:21 +0200

Thanks for the info, Kevin!

On 06/09/2020 23:15, Kevin Fenzi wrote:
1) Why rdiff-backup? There are other alternatives present both in the Fedora
*and* CentOS/RHEL default repos like Amanda or Bacula, and lots more if you
add the EPEL repo (rsnapshot, BackupPC, Borg...)

So, first keep in mind that Fedora infrastructure has been around a long
time. They are likely things we selected years ago before there were
other alternatives. Over the years we have used bacula, a custom thing
and then rdiff-backup (which I think we moved to in about 2011 or 2010).

It was looking like rdiff-backup would get left behind as it was
python2 only and not under active development. Luckily, development
revived upstream and they ported to python3 and have been much more
active of late.

Also, I have read that Eric Lavarde, the current maintainer, works for 
Red Hat. That's a plus too (easier to work with someone in-house)

That said, I did look into backups more last year. The two frontrunners
on features and activity and such were borg and restic. However, one big
issue with both of those is that they work on a model of the client
pushing backups to the server. With rdiff-backup we reach out from the
server and pull backups from the client. That means in that case that
client only has a ssh public key and 0 other access to the server.
There are of course ways to restrict access on the server from the
clients (restrict to a specific command, borg has 'append only' repo
settings, etc).

I also prefer pulling backups from the backup server. It's safer. 
Totally understandable.

rdiff-backup is also pretty simple, if you need something from the last
backup run, you can just copy it off. The backing store for backups is a
netapp volume, so it can run de-dupe for space savings and save us from
doing it on the application layer.

That makes sense: the NetApp filer does the checksumming (not like ZFS, 
see: https://oshogbo.vexillium.org/blog/73/), snapshots and 
de-duplication, so there is no need for that on the backup application.

A few more questions about the filer:

1.1) Is it managed by RHIT?

1.2) What happens if that NetApp volume "fails"? Just playing devil's 
advocate here: I know this shouldn't be a hardware problem. The filer 
surely has a support contract and NetApp continuously monitors its 
systems... But human error could take down that volume, delete its data, 
etc.

1.3) Are there off-site and/or offline backups? Probably if RHIT manages 
the filer, they will take care of this...

Because of that I didn't see a great need to switch away
from rdiff-backup. If there's some good advantage to doing so, we could
definitely revisit it.

Neither I, but thank you for being open to discuss it!
2) The current setup uses a few tools: cron, git, ansible and rdiff-backup.
Wouldn't be simpler to use a tool that just takes care of everything by
itself?

Well, all those things are pretty simple and easy to understand.
If you have one tool doing them all, it's much less clear how it works
or what it's doing.

To me, using git and ansible to run backup tasks is a bit strange, but 
that's because I have very simple needs at work and at home and I don't 
require the flexibility those tools provide. I understand why they are 
used here.
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx