Hi Patrick, On Fri, 4 Oct 2024, Patrick Steinhardt wrote: > On Fri, Oct 04, 2024 at 05:59:30AM +0200, Patrick Steinhardt wrote: > > On Fri, Oct 04, 2024 at 02:02:44AM +0100, Ramsay Jones wrote: > > > > > > Just a quick heads up: t0610-reftable-basics.sh test 47 (ref > > > transaction: many concurrent writers) fails on cygwin. The tail end > > > of the debug output for this test looks like: > > > > > [snip] > > > > > > t0610-reftable-basics.sh passed on 'rc0', but this test (and the > > > timeout facility) is new in 'rc1'. I tried simply increasing the > > > timeout (10 fold), but that didn't change the result. (I didn't > > > really expect it to - the 'reftable: transaction prepare: I/O error' > > > does not look timing related!). > > > > > > Again, just a heads up. (I can't look at it until tomorrow now; any > > > ideas?) > > > > This failure is kind of known and discussed in [1]. Just to make it > > explicit: this test failure doesn't really surface a regression, the > > reftable code already failed for concurrent writes before. I fixed > > that and added the test that is now flaky, as the fix itself is > > seemingly only sufficient on Linux and macOS. > > > > I didn't yet have the time to look at whether I can fix it, but should > > finally find the time to do so today. > > Hm, interestingly enough I cannot reproduce the issue on Cygwin myself, > but I can reproduce the issue with MinGW. And in fact, the logs you have > sent all indicate that we cannot acquire the lock, there is no sign of > I/O errors here. So I guess you're running into timeout issues. Does the > following patch fix this for you? > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 2d951c8ceb..b5cad805d4 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' ' > git init repo && > ( > cd repo && > - # Set a high timeout such that a busy CI machine will not abort > - # early. 10 seconds should hopefully be ample of time to make > - # this non-flaky. > - git config set reftable.lockTimeout 10000 && > + git config set reftable.lockTimeout -1 && > test_commit --no-tag initial && > > head=$(git rev-parse HEAD) && That looks plausible to me, as this test is exercising the POSIX emulation layer of Cygwin much more than Git itself (and therefore I expect this test case to take a really long time on Cygwin). If it turns out that this works around the issue, I would propose to use something like this instead: # Cygwin needs a slightly longer timeout if have_prereq !CYGWIN then git config set reftable.lockTimeout 10000 else git config set reftable.lockTimeout 60000 fi && > The issue on Win32 is different: we cannot commit the "tables.list" lock > via rename(3P) because the target file may be open for reading by a > concurrent process. I guess that Cygwin has proper POSIX semantics for > rename(3P) and thus doesn't hit the same issue. Indeed, this is where the symptom lies. I worked around it in Git for Windows v2.47.0-rc1 via this patch: -- snipsnap -- diff --git a/compat/mingw.c b/compat/mingw.c index c1030e40f227..56db193d1360 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...) int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL); wchar_t wfilename[MAX_LONG_PATH]; open_fn_t open_fn; + int tries = 0; va_start(args, oflags); mode = va_arg(args, int); @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...) else if (xutftowcs_long_path(wfilename, filename) < 0) return -1; - fd = open_fn(wfilename, oflags, mode); + do { + fd = open_fn(wfilename, oflags, mode); + } while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION && + retry_ask_yes_no(&tries, "Opening '%s' failed because another " + "application accessed it. Try again?", filename)); if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) { _mode_t wsl_mode = S_IFREG | (mode&0777);