Re: git gc --auto aquires *.lock files that make a subsequent git-fetch error out

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 12 Jul 2017 22:30:25 +0200

On Wed, Jul 12 2017, Jeff King jotted:

> On Wed, Jul 12, 2017 at 09:38:46PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> In 131b8fcbfb ("fetch: run gc --auto after fetching", 2013-01-26) first
>> released with v1.8.2 Jeff changed git-fetch to run "git gc --auto"
>> afterwards.
>>
>> This means that if you run two git fetches in a row the second one may
>> fail because it can't acquire the *.lock files on the remote branches you
>> have & which the next git-fetch needs to update.
>
> Is it really "in a row" that's a problem? The second fetch should not
> begin until the first one is done, including until its auto-gc exits.
> And even with background gc, we do the ref-locking operations first, due
> to 62aad1849 (gc --auto: do not lock refs in the background,
> 2014-05-25).
>
>> I happen to run into this on a git.git which has a lot of remotes (most
>> people on-list whose remotes I know about) and fetch them in parallel:
>>
>>     $ git config alias.pfetch
>>     !parallel 'git fetch {}' ::: $(git remote)
>
> Ah, so it's not in a row. It's parallel. Then yes, you may run into
> problems with the gc locks conflicting with real operations. This isn't
> really unique to fetch. Any simultaneous operation can run into problems
> (e.g., on a busy server repo you may see conflicts between pack-refs and
> regular pushes).

This is what I thought at first, and I've only encountered the issue in
this parallel mode (mainly because it's tedious to reproduce). But I
think the traces below show that it would happen with "git fetch --all"
& "git remote update" as well, so the parallel invocations didn't
matter.

I.e. I'd just update my first remote, then git-gc would start in the
background and lock refs for my other remotes, which I'd then fail to
update.

>> And so would 'git fetch --all':
>>
>>     $ GIT_TRACE=1 git fetch --all 2>&1|grep --line-buffered built-in|grep -v rev-list
>>     19:31:26.273577 git.c:328               trace: built-in: git 'fetch' '--all'
>>     19:31:26.278869 git.c:328               trace: built-in: git 'fetch' '--append' 'origin'
>>     19:31:27.993312 git.c:328               trace: built-in: git 'gc' '--auto'
>>     19:31:27.995855 git.c:328               trace: built-in: git 'fetch' '--append' 'avar'
>>     19:31:29.656925 git.c:328               trace: built-in: git 'gc' '--auto'
>>
>> I think those two cases are bugs (but ones which I don't have the
>> inclination to chase myself beyond sending this E-Mail). We should be
>> running the 'git gc --auto' at the very end of the entire program, not
>> after fetching every single remote.
>>
>> Passing some env variable (similar to the config we pass via the env) to
>> subprograms to make them avoid "git gc --auto" so the main process can
>> do it would probably be the most simple solution.
>
> Yes, I agree that's poor. Ideally there would be a command-line option
> to tell the sub-fetches not to run auto-gc. It could be done with:
>
>   git -c gc.auto=0 fetch --append ...
>
> Or we could even take the "--append" as a hint not to run auto-gc.
>
>> The more general case (such as with my parallel invocation) is harder to
>> solve.
>
> Yes, I don't think it can solved. The most general case is two totally
> unrelated processes which know nothing about each other.
>
>> Maybe "git gc --auto" should have a heuristic so it checks whether
>> there's been recent activity on the repo, and waits until there's been
>> say 60 seconds of no activity, or alternatively if it's waited 600
>> seconds and hasn't run gc yet.
>
> That sounds complicated.
>
>> Ideally a "real" invocation like git-fetch would have a way to simply
>> steal any *.lock a background "git gc --auto" creates, aborting the gc
>> but allowing the "real" invocation to proceed. But that sounds even
>> trickier to implement, and might without an extra heuristic on top
>> postpone gc indefinitely.
>
> The locks are generally due to ref-packing and reflog expiration.  I
> think in the long run, it would be nice to move to a ref store that
> didn't need packing, and that could do reflog expiration more
> atomically.
>
> I think the way "reflog expire" is done holds the locks for a lot longer
> than is strictly necessary, too (it actually computes reachability for
> --expire-unreachable on the fly while holding some locks).
>
> -Peff