Re: [RFC/PATCH] gc: run more pre-detach operations under lock

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 19 Jun 2019 20:01:55 +0200

On Wed, Jun 19 2019, Duy Nguyen wrote:

> On Wed, Jun 19, 2019 at 5:26 PM Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
>> This patch is part of a WIP branch I have that's a bit of a mess. It's
>> more-gc-detach-under-lock on github.com/avar/git.git. It doesn't apply
>> on master because it relies on some previous test work, but for RFC
>> purposes I figured it was better to send it stand-alone.
>>
>> But I think this sort of approach is better than Duy's proposed patch,
>> because...
>>
>> On Wed, Jun 19 2019, Nguyễn Thái Ngọc Duy wrote:
>>
>> > So let's try to avoid that. We should only need one 'gc' run after all
>> > objects and references are added anyway. Add a new option --no-auto-gc
>> > that will be used by those n-1 processes. 'gc --auto' will always run on
>> > the main fetch process (*).
>> >
>> > (*) even if we fetch remotes in parallel at some point in future, this
>> >     should still be fine because we should "join" all those processes
>> >     before this step.
>>
>> This is what I'm trying to fix in my version of this patch. This is
>> only true for yours if you assume that the user is going to be
>> invoking "fetch" in a single terminal window, IOW that we have an
>> implicit global mutex of one top-level git command at a time.
>>
>> Wheras mine fixes e.g. the same issue for:
>>
>>     parallel 'git fetch {}' ::: $(git remote)
>>
>> Ditto for you running a "git" command and your editor running a
>> "fetch" at the same time.
>
> You could sort of avoid the problem here too with
>
> parallel 'git fetch --no-auto-gc {}' ::: $(git remote)
> git gc --auto
>
> It's definitely simpler, but of course we have to manually add
> --no-auto-gc in everywhere we need, so not quite as elegant.
>
> Actually you could already do that with 'git -c gc.auto=false fetch', I guess.

The point of the 'parallel' example is to show disconnected git
commands, think trying to run 'git' in a terminal while your editor
asynchronously runs a polling 'fetch', or a server with multiple
concurrent clients running 'gc --auto'.

That's the question my RFC patch raises. As far as I can tell the
approach in your patch is only needed because our locking for gc is
buggy, rather than introduce the caveat that an fetch(N) operation won't
do "gc" until it's finished (we may have hundreds, thousands of remotes,
I use that for some more obscure use-cases) shouldn't we just fix the
locking?