Hi folks, On Wed, Jul 14, 2021 at 10:03 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > *nod* > > FWIW at an ex-job I helped systems administrators who'd produced such a > broken backup-via-rsync create a hybrid version as an interim > solution. I.e. it would sync the objects via git transport, and do an > rsync on a whitelist (or blacklist), so pickup config, but exclude > objects. > > "Hybrid" because it was in a state of needing to deal with manual > tweaking of config. > > But usually someone who's needing to thoroughly solve this backup > problem will inevitably end up with wanting to drive everything that's > not in the object or refstore from some external system, i.e. have > config be generated from puppet, a database etc., ditto for alternates > etc. > > But even if you can't get to that point (or don't want to) I'd say aim > for the hybrid system. FWIW, we are running our repo on top of a some-what flickery DRBD setup and we decided to use both git clone --upload-pack 'git -c transfer.hiderefs="!refs" upload-pack' --mirror` and `tar` to create 2 separate snapshots for backup in parallel (full backup, not incremental). In case of recovery (manual), we first rely on the git snapshot and if there is any missing objects/refs, we will try to get it from the tarball. > > This isn't some purely theoretical concern b.t.w., the system using > rsync like this was producing repos that wouldn't fsck all the time, and > it wasn't such a busy site. > > I suspect (but haven't tried) that for someone who can't easily change > their backup solution they'd get most of the benefits of git-native > transport by having their "rsync" sync refs, then objects, not the other > way around. Glob order dictates that most backup systems will do > objects, then refs (which will of course, at that point, refer to > nonexisting objects). > > It's still not safe, you'll still be subject to races, but probably a > lot better in practice. I would love to get some guidance in official documentation on what is the best practice around handling git data on the server side. Is git-clone + git-bundle the go-to solution? Should tar/rsync not be used completely or is there a trade-off? Thanks, Son Luong.