On Wed, Jul 14 2021, Martin Fick wrote: > On Wednesday, July 14, 2021 8:19:15 PM MDT Ævar Arnfjörð Bjarmason wrote: >> The best way to get backups of git repositories you know are correct are >> is to use git's own transport mechanisms, i.e. fetch/pull the data, or >> create bundles from it. > > I don't think this is a fair recommendation since unfortunately, this cannot > be used to create a full backup. This can be used to back up the version > controlled data, but not the repositories meta-data, i.e. configs, reflogs, > alternate setups... *nod* FWIW at an ex-job I helped systems administrators who'd produced such a broken backup-via-rsync create a hybrid version as an interim solution. I.e. it would sync the objects via git transport, and do an rsync on a whitelist (or blacklist), so pickup config, but exclude objects. "Hybrid" because it was in a state of needing to deal with manual tweaking of config. But usually someone who's needing to thoroughly solve this backup problem will inevitably end up with wanting to drive everything that's not in the object or refstore from some external system, i.e. have config be generated from puppet, a database etc., ditto for alternates etc. But even if you can't get to that point (or don't want to) I'd say aim for the hybrid system. This isn't some purely theoretical concern b.t.w., the system using rsync like this was producing repos that wouldn't fsck all the time, and it wasn't such a busy site. I suspect (but haven't tried) that for someone who can't easily change their backup solution they'd get most of the benefits of git-native transport by having their "rsync" sync refs, then objects, not the other way around. Glob order dictates that most backup systems will do objects, then refs (which will of course, at that point, refer to nonexisting objects). It's still not safe, you'll still be subject to races, but probably a lot better in practice.