Hi Stefan, On Tue, 20 Feb 2018, Stefan Beller wrote: > On Tue, Feb 20, 2018 at 8:20 AM, Johannes Schindelin > <Johannes.Schindelin@xxxxxx> wrote: > > Hi Stefan (and other submodule wizards), > > > > Igor Melnichenko reported a submodule problem in a Git for Windows ticket: > > > > https://github.com/git-for-windows/git/issues/1469 > > > > Thanks for bringing this up. Is it better to discuss it here on at that > issue? It's up to you. I have no preference. > > Part of it is due to Windows' behavior where you cannot read the same > > file in one process while you write it in another process. > > > > The other part is how our submodule code works in parallel. In > > particular, we seem to write the `.git` file maybe even every time a > > submodule is fetched, unconditionally, not even testing whether the > > .git file is already there and contains the correct contents? > > only when they need to be newly cloned (as that is when the > submodule--helper clone is invoked). When fetching, we'd not touch the > '.git' file IIRC. Okay. I saw multiple calls to the same connect_work_tree_and_git_dir() function in submodule.c, and was not so sure that it was only clone. But the bug report talks about `git submodule add` with a new URL and a new submodule location, so I guess it is going down the clone path. > > For some reason, the bug reporter saw a "Permission denied" on the > > `.git` file when we try to write it (and I am pretty certain that I > > tracked it down correctly to the `connect_work_tree_and_git_dir()` > > function). The intermittent "Permission denied" error seems to suggest > > that *another* process is accessing this file while we are writing it. > > The reporter also reports this coming out of "git submodule add", > which is not parallelized, but rather in the messy state of migrating > as much code into C from shell, trying to not introduce to many bugs. ;) > > So for add there is no parallelism I am aware of, which makes me wonder > how the race is coming in there. Hrm. You're right, there is no indication in that bug report that `git submodule` spawns multiple processes in parallel. > I recall making the decision to unconditionally write out the '.git' file > for other operations like "git reset --hard --recurse-submodules" or > "git checkout --force --recurse-submodules", as there you are really asking > for a clean slate. I missed the Windows FS semantics for that case, > so I guess re-reading the file to make sure it is already correct would > avoid issues on Windows there. (though I do not yet recall how the > parallel stuff would bite us there, as the file is written as the very > first thing) Maybe it is not parallel, but maybe it is simply an unclosed handle to the .git file? I do not see anything in read_gitfile_gently() (which is the only location reading the .git file I know of)... Did I miss anything? > > It also seems that this problem becomes worse if the firewall is turned > > on, in which case a couple of network operations become a little slower > > (which I could imagine to be the factor making the problems more likely). > > > > A plausible workaround would be to write the `.git` file only when needed > > (which also would fix a bug on macOS/Linux, although the window is > > substantially smaller: the reader could potentially read a > > partially-written file). > > > > But maybe we are simply holding onto an open handle to the `.git` file > > too long? > > That could very well be the case. But as I said above, I fail to find any smoking gun in the source code. The only code path I see that reads the .git file releases the resources correctly AFAICT. > > I tried to put together a bit more information here: > > > > https://github.com/git-for-windows/git/issues/1469#issuecomment-366932746 > > Thanks! > > > Do you think there is an easy solution for this? You're much deeper in the > > submodule code than I ever want to be... > > I'll take a look for open handles, though we're already using our > wrapper functions > like 'write_file', which is supposed to close the handle gracefully (and has > undergone testing on Windows a lot as it is the standard write function) Right, even some of my code uses this successfully... Curious problem. Ciao, Dscho