On Wed, Jan 03, 2024 at 08:33:24AM +0100, Patrick Steinhardt wrote: > On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote: > > Stefan Haller <lists@xxxxxxxxxxxxxxxx> writes: > > > > > I can reliably reproduce this by doing > > > > > > $ git fetch&; sleep 0.1; git pull > > > [1] 42160 > > > [1] + done git fetch > > > fatal: Cannot rebase onto multiple branches. > > > > I see a bug here. > > > > How this _ought_ to work is > > > > - The first "git fetch" wants to report what it fetched by writing > > into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after > > the fetch finishes can consume its contents). > > > > - The second "git pull" runs "git fetch" under the hood. Because > > it also wants to write to $GIT_DIR/FETCH_HEAD, and because there > > is already somebody writing to the file, it should notice and > > barf, saying "fatal: a 'git fetch' is already working" or > > something. > > > > But because there is no "Do not overwrite FETCH_HEAD somebody else > > is using" protection, "git merge" or "git rebase" that is run as the > > second half of the "git pull" ends up working on the contents of > > FETCH_HEAD that is undefined, and GIGO result follows. > > > > The "bug" that the second "git fetch" does not notice an already > > running one (who is in possession of FETCH_HEAD) and refrain from > > starting is not easy to design a fix for---we cannot just abort by > > opening it with O_CREAT|O_EXCL because it is a normal thing for > > $GIT_DIR/FETCH_HEAD to exist after the "last" fetch. We truncate > > its contents before starting to avoid getting affected by contents > > leftover by the last fetch, but when there is a "git fetch" that is > > actively running, and it finishes _after_ the second one starts and > > truncates the file, the second one will end up seeing the contents > > the first one left. We have the "--no-write-fetch-head" option for > > users to explicitly tell which invocation of "git fetch" should not > > write FETCH_HEAD. > > While I agree that it's the right thing to use "--no-write-fetch-head" > in this context, I still wonder whether we want to fix this "bug". It > would be a rather easy change on our side to start using the lockfile > API to write to FETCH_HEAD, which has a bunch of benefits: > > - We would block concurrent processes of writing to FETCH_HEAD at the > same time (well, at least for clients aware of the new semantics). > > - Consequentially, we do not write a corrupted FETCH_HEAD anymore when > multiple processes write to it at the same time. > > - We're also more robust against corruption in the context of hard > crashes due to atomic rename semantics and proper flushing. > > I don't really see much of a downside except for the fact that we change > how this special ref is being written to, so other implementations would > need to adapt accordingly. But even if they didn't, if clients with both > the new and old behaviour write FETCH_HEAD at the same point in time the > result would still be a consistent FETCH_HEAD if both writes finish. We > do have a race now which of both versions of FETCH_HEAD we see, but that > feels better than corrupted contents to me. Ah, one thing I didn't think of is parallel fetches. It's expected that all of the fetches write into FETCH_HEAD at the same point in time concurrently, so the end result is the collection of updated refs even though the order isn't guaranteed. That makes things a bit more complicated indeed. Patrick
Attachment:
signature.asc
Description: PGP signature