Re: Concurrent fetch commands

Patrick Steinhardt <ps@xxxxxx> · Wed, 3 Jan 2024 08:33:24 +0100

On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:
> Stefan Haller <lists@xxxxxxxxxxxxxxxx> writes:
> 
> > I can reliably reproduce this by doing
> >
> >    $ git fetch&; sleep 0.1; git pull
> >    [1] 42160
> >    [1]  + done       git fetch
> >    fatal: Cannot rebase onto multiple branches.
> 
> I see a bug here.
> 
> How this _ought_ to work is
> 
>  - The first "git fetch" wants to report what it fetched by writing
>    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
>    the fetch finishes can consume its contents).
> 
>  - The second "git pull" runs "git fetch" under the hood.  Because
>    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
>    is already somebody writing to the file, it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.
> 
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
> 
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left.  We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.

While I agree that it's the right thing to use "--no-write-fetch-head"
in this context, I still wonder whether we want to fix this "bug". It
would be a rather easy change on our side to start using the lockfile
API to write to FETCH_HEAD, which has a bunch of benefits:

  - We would block concurrent processes of writing to FETCH_HEAD at the
    same time (well, at least for clients aware of the new semantics).

  - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
    multiple processes write to it at the same time.

  - We're also more robust against corruption in the context of hard
    crashes due to atomic rename semantics and proper flushing.

I don't really see much of a downside except for the fact that we change
how this special ref is being written to, so other implementations would
need to adapt accordingly. But even if they didn't, if clients with both
the new and old behaviour write FETCH_HEAD at the same point in time the
result would still be a consistent FETCH_HEAD if both writes finish. We
do have a race now which of both versions of FETCH_HEAD we see, but that
feels better than corrupted contents to me.

Patrick
Attachment:
signature.asc

Description: PGP signature