On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote: > On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote: > > > So I couldn't find a race condition in the code. I tracked the problem > > to grep_source_load_file which attempts to run lstat on the file so that > > it can read it into a buffer. The lstat call fails with ENOENT (which > > conveniently is skipped by the if statement which calls error_errno). So > > for some reason the file cannot be found and read into memory resulting > > in nothing being grep'ed for that particular file (since the buffer is > > NULL). > > That's definitely weird. Is it possible that any of the underlying calls > from another thread are using chdir()? I think realpath() make do that > behind the scenes, and there may be others. > > A full strace from a failing case would be interesting reading. In > theory we should be able to get that by running the stress script for > long enough. :) Actually, it failed pretty much immediately. I guess the extra stracing changes the timing to make the problem _more_ likely. And indeed, I see: 20867 lstat("fi:le", <unfinished ...> 20813 <... read resumed> "", 232) = 0 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...> 20813 close(7 <unfinished ...> 20870 <... futex resumed> ) = 0 20869 lstat(".gitmodules", <unfinished ...> 20813 <... close resumed> ) = 0 20865 set_robust_list(0x7f1df92579e0, 24 <unfinished ...> 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT (No such file or directory) 20865 <... set_robust_list resumed> ) = 0 20868 set_robust_list(0x7f1df7a549e0, 24 <unfinished ...> 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, ...}) = 0 20813 getcwd("/var/ram/git-stress/root-4/trash directory.t7814-grep-recurse-submodules/parent", 129) = 80 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0 20813 chdir("su:b/../.git/modules/su:b") = 0 20869 open(".gitmodules", O_RDONLY <unfinished ...> 20813 getcwd( <unfinished ...> 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or directory) where 20813 and 20867 are two threads of the main process. One is doing the lstat and the other calls chdir at the same moment. -Peff