On Wed, Jul 14 2021, Martin Fick wrote: > On Wednesday, July 14, 2021 9:41:42 PM MDT you wrote: >> On Wed, Jul 14 2021, Martin Fick wrote: >> > On Wednesday, July 14, 2021 8:19:15 PM MDT Ævar Arnfjörð Bjarmason wrote: >> >> The best way to get backups of git repositories you know are correct are >> >> is to use git's own transport mechanisms, i.e. fetch/pull the data, or >> >> create bundles from it. >> > >> > I don't think this is a fair recommendation since unfortunately, this >> > cannot be used to create a full backup. This can be used to back up the >> > version controlled data, but not the repositories meta-data, i.e. >> > configs, reflogs, alternate setups... >> >> *nod* >> >> FWIW at an ex-job I helped systems administrators who'd produced such a >> broken backup-via-rsync create a hybrid version as an interim >> solution. I.e. it would sync the objects via git transport, and do an >> rsync on a whitelist (or blacklist), so pickup config, but exclude >> objects. >> >> "Hybrid" because it was in a state of needing to deal with manual >> tweaking of config. >> >> But usually someone who's needing to thoroughly solve this backup >> problem will inevitably end up with wanting to drive everything that's >> not in the object or refstore from some external system, i.e. have >> config be generated from puppet, a database etc., ditto for alternates >> etc. >> >> But even if you can't get to that point (or don't want to) I'd say aim >> for the hybrid system. >> >> This isn't some purely theoretical concern b.t.w., the system using >> rsync like this was producing repos that wouldn't fsck all the time, and >> it wasn't such a busy site. >> >> I suspect (but haven't tried) that for someone who can't easily change >> their backup solution they'd get most of the benefits of git-native >> transport by having their "rsync" sync refs, then objects, not the other >> way around. Glob order dictates that most backup systems will do >> objects, then refs (which will of course, at that point, refer to >> nonexisting objects). >> >> It's still not safe, you'll still be subject to races, but probably a >> lot better in practice. > > It would be great if git provided a command to do a reliable incremental > backup, maybe it could copy things in the order that you mention? I don't think we can or want to support this sort of thing ever, for the same reason that you probably won't convince MySQL,PostgreSQL etc. that they should support "cp -r" as a mode for backing up their live database services. I mean, there is the topic of git being lazy about fsync() etc, but even if all of that were 100% solved you'd still get bad things if you picked an arbitrary time to snapshot a running git directory, e.g. your "master" branch might have a "master.lock" because it was in the middle of an update. If you used "fetch/clone/bundle" etc. to get the data no problem, but if your snapshot happens then you'd need to manually clean that up, a situation which in practice wouldn't persist, but would be persistent with a snapshot approach. > However, most people will want to use the backup system they have and not a > special git tool. Maybe git fsck should gain a switch that would rewind any > refs to an older point that is no broken (using reflogs)? That way, most > backups would just work and be rewound to the point at which the backup > started? I think the main problem in the wild is not the inability of using a special tool, but one of education. Most people wouldn't think of "cp -r" as a first approach to say backing up a live mysql server, they'd use mysqldump and the like. But for some reason git is considered "not a database" enough that those same people would just use rsync/tar/whatever, and are then surprised when their data is corrupt or in some weird or inconsistent state... Anyway, see also my just-posted: https://lore.kernel.org/git/878s21wl4z.fsf@xxxxxxxxxxxxxxxxxxx/ I.e. I'm not saying "never use rsync", there's cases where that's fine, but for a live "real" server I'd say solutions in that class shouldn't be considered/actively migrated away from.