On Wed, Nov 30, 2016 at 3:40 PM, Jeff King <peff@xxxxxxxx> wrote: > On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote: > >> On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote: >> >> > So I couldn't find a race condition in the code. I tracked the problem >> > to grep_source_load_file which attempts to run lstat on the file so that >> > it can read it into a buffer. The lstat call fails with ENOENT (which >> > conveniently is skipped by the if statement which calls error_errno). So >> > for some reason the file cannot be found and read into memory resulting >> > in nothing being grep'ed for that particular file (since the buffer is >> > NULL). >> >> That's definitely weird. Is it possible that any of the underlying calls >> from another thread are using chdir()? I think realpath() make do that >> behind the scenes, and there may be others. >> >> A full strace from a failing case would be interesting reading. In >> theory we should be able to get that by running the stress script for >> long enough. :) > > Actually, it failed pretty much immediately. I guess the extra stracing > changes the timing to make the problem _more_ likely. > > And indeed, I see: > > 20867 lstat("fi:le", <unfinished ...> > 20813 <... read resumed> "", 232) = 0 > 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...> > 20813 close(7 <unfinished ...> > 20870 <... futex resumed> ) = 0 > 20869 lstat(".gitmodules", <unfinished ...> > 20813 <... close resumed> ) = 0 > 20865 set_robust_list(0x7f1df92579e0, 24 <unfinished ...> > 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT (No such file or directory) > 20865 <... set_robust_list resumed> ) = 0 > 20868 set_robust_list(0x7f1df7a549e0, 24 <unfinished ...> > 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0 > 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0 > 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, ...}) = 0 > 20813 getcwd("/var/ram/git-stress/root-4/trash directory.t7814-grep-recurse-submodules/parent", 129) = 80 > 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0 > 20813 chdir("su:b/../.git/modules/su:b") = 0 > 20869 open(".gitmodules", O_RDONLY <unfinished ...> > 20813 getcwd( <unfinished ...> > 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or directory) > > where 20813 and 20867 are two threads of the main process. One is doing > the lstat and the other calls chdir at the same moment. > Lessons learned here: The run-command API is not thread safe when used with setting the directory. If you need to run a thing in a threaded environment run git -C <dir> ... such that the child chdirs. Are there any other threaded environments that run things with .dir set?