Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration

Son Luong Ngoc <sluongng@xxxxxxxxx> · Thu, 15 Jul 2021 10:23:04 +0200

Hi folks,

On Wed, Jul 14, 2021 at 10:03 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> *nod*
>
> FWIW at an ex-job I helped systems administrators who'd produced such a
> broken backup-via-rsync create a hybrid version as an interim
> solution. I.e. it would sync the objects via git transport, and do an
> rsync on a whitelist (or blacklist), so pickup config, but exclude
> objects.
>
> "Hybrid" because it was in a state of needing to deal with manual
> tweaking of config.
>
> But usually someone who's needing to thoroughly solve this backup
> problem will inevitably end up with wanting to drive everything that's
> not in the object or refstore from some external system, i.e. have
> config be generated from puppet, a database etc., ditto for alternates
> etc.
>
> But even if you can't get to that point (or don't want to) I'd say aim
> for the hybrid system.

FWIW, we are running our repo on top of a some-what flickery DRBD setup and
we decided to use both

  git clone --upload-pack 'git -c transfer.hiderefs="!refs"
upload-pack' --mirror`

and

  `tar`

to create 2 separate snapshots for backup in parallel (full backup,
not incremental).

In case of recovery (manual), we first rely on the git snapshot and if
there is any
missing objects/refs, we will try to get it from the tarball.

>
> This isn't some purely theoretical concern b.t.w., the system using
> rsync like this was producing repos that wouldn't fsck all the time, and
> it wasn't such a busy site.
>
> I suspect (but haven't tried) that for someone who can't easily change
> their backup solution they'd get most of the benefits of git-native
> transport by having their "rsync" sync refs, then objects, not the other
> way around. Glob order dictates that most backup systems will do
> objects, then refs (which will of course, at that point, refer to
> nonexisting objects).
>
> It's still not safe, you'll still be subject to races, but probably a
> lot better in practice.

I would love to get some guidance in official documentation on what is the best
practice around handling git data on the server side.

Is git-clone + git-bundle the go-to solution?
Should tar/rsync not be used completely or is there a trade-off?

Thanks,
Son Luong.