Re: File locking issues with fetching submodules in parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stefan,

On Tue, 20 Feb 2018, Stefan Beller wrote:

> On Tue, Feb 20, 2018 at 8:20 AM, Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
> > Hi Stefan (and other submodule wizards),
> >
> > Igor Melnichenko reported a submodule problem in a Git for Windows ticket:
> >
> >         https://github.com/git-for-windows/git/issues/1469
> >
> 
> Thanks for bringing this up. Is it better to discuss it here on at that
> issue?

It's up to you. I have no preference.

> > Part of it is due to Windows' behavior where you cannot read the same
> > file in one process while you write it in another process.
> >
> > The other part is how our submodule code works in parallel. In
> > particular, we seem to write the `.git` file maybe even every time a
> > submodule is fetched, unconditionally, not even testing whether the
> > .git file is already there and contains the correct contents?
> 
> only when they need to be newly cloned (as that is when the
> submodule--helper clone is invoked). When fetching, we'd not touch the
> '.git' file IIRC.

Okay. I saw multiple calls to the same connect_work_tree_and_git_dir()
function in submodule.c, and was not so sure that it was only clone.

But the bug report talks about `git submodule add` with a new URL and a
new submodule location, so I guess it is going down the clone path.

> > For some reason, the bug reporter saw a "Permission denied" on the
> > `.git` file when we try to write it (and I am pretty certain that I
> > tracked it down correctly to the `connect_work_tree_and_git_dir()`
> > function). The intermittent "Permission denied" error seems to suggest
> > that *another* process is accessing this file while we are writing it.
> 
> The reporter also reports this coming out of "git submodule add",
> which is not parallelized, but rather in the messy state of migrating
> as much code into C from shell, trying to not introduce to many bugs. ;)
> 
> So for add there is no parallelism I am aware of, which makes me wonder
> how the race is coming in there.

Hrm. You're right, there is no indication in that bug report that `git
submodule` spawns multiple processes in parallel.

> I recall making the decision to unconditionally write out the '.git' file
> for other operations like "git reset --hard --recurse-submodules" or
> "git checkout --force --recurse-submodules", as there you are really asking
> for a clean slate. I missed the Windows FS semantics for that case,
> so I guess re-reading the file to make sure it is already correct would
> avoid issues on Windows there. (though I do not yet recall how the
> parallel stuff would bite us there, as the file is written as the very
> first thing)

Maybe it is not parallel, but maybe it is simply an unclosed handle to the
.git file? I do not see anything in read_gitfile_gently() (which is the
only location reading the .git file I know of)... Did I miss anything?

> > It also seems that this problem becomes worse if the firewall is turned
> > on, in which case a couple of network operations become a little slower
> > (which I could imagine to be the factor making the problems more likely).
> >
> > A plausible workaround would be to write the `.git` file only when needed
> > (which also would fix a bug on macOS/Linux, although the window is
> > substantially smaller: the reader could potentially read a
> > partially-written file).
> >
> > But maybe we are simply holding onto an open handle to the `.git` file
> > too long?
> 
> That could very well be the case.

But as I said above, I fail to find any smoking gun in the source code.
The only code path I see that reads the .git file releases the resources
correctly AFAICT.

> > I tried to put together a bit more information here:
> >
> > https://github.com/git-for-windows/git/issues/1469#issuecomment-366932746
> 
> Thanks!
> 
> > Do you think there is an easy solution for this? You're much deeper in the
> > submodule code than I ever want to be...
> 
> I'll take a look for open handles, though we're already using our
> wrapper functions
> like 'write_file', which is supposed to close the handle gracefully (and has
> undergone testing on Windows a lot as it is the standard write function)

Right, even some of my code uses this successfully...

Curious problem.

Ciao,
Dscho



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux