On Fri, Oct 04, 2024 at 05:59:30AM +0200, Patrick Steinhardt wrote: > On Fri, Oct 04, 2024 at 02:02:44AM +0100, Ramsay Jones wrote: > > Hi Patrick, > > > > Just a quick heads up: t0610-reftable-basics.sh test 47 (ref transaction: many > > concurrent writers) fails on cygwin. The tail end of the debug output for this > > test looks like: > > > [snip] > > > > t0610-reftable-basics.sh passed on 'rc0', but this test (and the timeout facility) > > is new in 'rc1'. I tried simply increasing the timeout (10 fold), but that didn't > > change the result. (I didn't really expect it to - the 'reftable: transaction > > prepare: I/O error' does not look timing related!). > > > > Again, just a heads up. (I can't look at it until tomorrow now; any ideas?) > > This failure is kind of known and discussed in [1]. Just to make it > explicit: this test failure doesn't really surface a regression, the > reftable code already failed for concurrent writes before. I fixed that > and added the test that is now flaky, as the fix itself is seemingly > only sufficient on Linux and macOS. > > I didn't yet have the time to look at whether I can fix it, but should > finally find the time to do so today. Hm, interestingly enough I cannot reproduce the issue on Cygwin myself, but I can reproduce the issue with MinGW. And in fact, the logs you have sent all indicate that we cannot acquire the lock, there is no sign of I/O errors here. So I guess you're running into timeout issues. Does the following patch fix this for you? diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 2d951c8ceb..b5cad805d4 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' ' git init repo && ( cd repo && - # Set a high timeout such that a busy CI machine will not abort - # early. 10 seconds should hopefully be ample of time to make - # this non-flaky. - git config set reftable.lockTimeout 10000 && + git config set reftable.lockTimeout -1 && test_commit --no-tag initial && head=$(git rev-parse HEAD) && The issue on Win32 is different: we cannot commit the "tables.list" lock via rename(3P) because the target file may be open for reading by a concurrent process. I guess that Cygwin has proper POSIX semantics for rename(3P) and thus doesn't hit the same issue. We already try to emulate POSIX semantics somewhat in `mingw_rename()` by using a retry-loop when we hit `ERROR_ACCESS_DENIED`, which is what we get when the target file is open in another process. But that seemingly isn't enough when there is a lot of contention around a file. So I'm currently investigating whether we can adopt something similar to what Cygwin is doing for Win32, as well. I assume that they use `FILE_RENAME_INFORMATION_EX` with `FILE_RENAME_POSIX_SEMANTICS`, which should give us what we're looking for. gh, well. Turns out the implementation of rename(3P) in Cygwin is 500 lines long. I guess this is a non-trivial problem :) But they of course have to handle a whole lot more cases than we have to. But my guess was correct: they do use `FILE_RENAME_POSIX_SEMANTICS`. The catch is that this flag only exists in Windows 10 and newer. But that should be a fine compromise. I'll try to wrap my head around how all of this works. Patrick