On 11/30, Jeff King wrote: > On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote: > > > On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote: > > > > > So I couldn't find a race condition in the code. I tracked the problem > > > to grep_source_load_file which attempts to run lstat on the file so that > > > it can read it into a buffer. The lstat call fails with ENOENT (which > > > conveniently is skipped by the if statement which calls error_errno). So > > > for some reason the file cannot be found and read into memory resulting > > > in nothing being grep'ed for that particular file (since the buffer is > > > NULL). > > > > That's definitely weird. Is it possible that any of the underlying calls > > from another thread are using chdir()? I think realpath() make do that > > behind the scenes, and there may be others. > > > > A full strace from a failing case would be interesting reading. In > > theory we should be able to get that by running the stress script for > > long enough. :) > > Actually, it failed pretty much immediately. I guess the extra stracing > changes the timing to make the problem _more_ likely. > > And indeed, I see: > > 20867 lstat("fi:le", <unfinished ...> > 20813 <... read resumed> "", 232) = 0 > 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...> > 20813 close(7 <unfinished ...> > 20870 <... futex resumed> ) = 0 > 20869 lstat(".gitmodules", <unfinished ...> > 20813 <... close resumed> ) = 0 > 20865 set_robust_list(0x7f1df92579e0, 24 <unfinished ...> > 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT (No such file or directory) > 20865 <... set_robust_list resumed> ) = 0 > 20868 set_robust_list(0x7f1df7a549e0, 24 <unfinished ...> > 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0 > 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0 > 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, ...}) = 0 > 20813 getcwd("/var/ram/git-stress/root-4/trash directory.t7814-grep-recurse-submodules/parent", 129) = 80 > 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0 > 20813 chdir("su:b/../.git/modules/su:b") = 0 > 20869 open(".gitmodules", O_RDONLY <unfinished ...> > 20813 getcwd( <unfinished ...> > 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or directory) > > where 20813 and 20867 are two threads of the main process. One is doing > the lstat and the other calls chdir at the same moment. > > -Peff Yeah so it looks like the start_command function calls chdir. Which means any uses of the run-command interface are not thread safe.... For now the work around could be to just pass "-C <dir>" to the child process instead of relying on run-command to chdir. -- Brandon Williams