Re: Mirroring for offline use - best practices?

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 12 Jul 2017 10:40:58 -0700

On Wed, Jul 12, 2017 at 3:47 AM, Joachim Durchholz <jo@xxxxxxxxxxxxx> wrote:
> Hi all,
>
> I'm pretty sure this is a FAQ, but articles I found on the Internet were
> either mere "recipes" (i.e. tell you how, but don't explain why), or bogged
> down in so many details that I was never sure how to proceed from there.
>
>
> Basic situation:
>
> There's a master repository (Github or corporate or whatever), and I want to
> set up a local mirror so that I can create clones without having to access
> the original upstream.

'git clone --mirror' should accomplish the mirroring part.

> I'd like to set the mirror up so that creating a clone from it will
> automatically set up things to "just work": I.e. branches will track the
> mirror, not upstream, possibly other settings that I'm not aware of.

And then 'git clone <local-path-to-mirror>'. This would setup the local
mirror as upstream, such that git-fetch would fetch from the
local mirror. However git-push would also go to the mirror. I am not
sure if this is desired or if you rather desire a triangular workflow, i.e.
the local clone would directly push back to the real upstream.
That can be configured with url.<base>.pushInsteadOf, but there
is no way to have that setup by default when cloning from the local
mirror as the config is not copied over.

>
> I gather that local clones are fast because hardlinked - is that correct?

Yes, a local path implies --local in git-clone, which (a) uses hardlinks
and (b) avoids some other protocol overhead.

> Is that correct on Windows? (I can't easily avoid Windows.)

Let's see if a Windows expert shows up, I cannot tell.

> Ramification 1:
>
> I'm not sure how best to prepare patches for push-to-upstream.
> Is there value in collecting them locally into a push-to-upstream repo, or
> is it better to just push from each local clone individually?

It depends on a lot of things:
* How critical is the latency in the desired workflow?

  Say you have this setup on a cruise ship and only push once when
  you are in a harbor, then (a) you want to make sure you pushed everything
  and (b) you care less about latency. Hence you would prefer to collect
  everything in one repo so nothing gets lost.

  Say you are in a fast paced environment, where you want instant feedback
  on your patches as they are mostly exploratory designs. Then you want to
  push directly from the local clone individually to minimize latency, I would
  imagine.

* Does a local clone have any value for having the work from
  another local clone available? In that case you may want to
  have all your changes accumulated into the mirror.

> Ramification 2:
>
> Some of the repos I work with use submodules. Sometimes they use submodules
> that I'm not aware of. Or a submodule was used historically, and git bisect
> breaks/misbehaves because it can't get the submodule in offline mode.

Oh!

> Is there a way to get these, without writing a script that recurses through
> all versions of .gitmodules?

Not, that I am aware of. You need to find all submodules.

When a submodule gets deleted (git rm <submodule> && git commit),
then all entries for that submodule in the .gitmodules file are also removed.
That seems ok, but in an ideal world we may have a tombstone in there
(e.g. the submodule.NAME.path still set) that would help for tasks like finding
all submodules in the future.

> I'm seeing the --recurse-submodules option for git fetch, so this might (or
> might not) be the Right Thing.

That only works for currently initialized (active) submodules. The submodules
of the past and those which you do not have, are not fetched.

Without the submodule ramifications, I would have advised to have
have the local mirror a 'bare' repo.

Hope that helps,
Stefan