Hi Patrick, On 04/10/2024 13:16, Patrick Steinhardt wrote: [snip] > Now the two reports are somewhat different from one another: > > - On Cygwin we hit timeouts because we fail to lock the "tables.list" > file within 10 seconds. The renames themselves succeed even when the > target file is open because Cygwin provides extensive compatibility > logic to make them work even when the target file is open already. Hmm, not so much for me! :( > > - On MinGW we hit I/O errors on rename. While we do have some retry > logic in place to make the rename work in some cases, this is > seemingly not sufficient when there is this much contention around > the files. I am seeing I/O errors. > > Neither of these cases is a regression: the logic didn't work before the > mentioned commit, and after the commit it performs well on Linux, macOS > and in Cygwin, and at least a bit better with MinGW. But the tests show > that we need to put more thought into how to make this work properly on > MinGW systems. OK > > The fact that Cygwin can work around this issue with better emulation of > POSIX-style atomic renames shows that we can in theory make MinGW work > better, as well. But doing so likely requires quite some fiddling with > Windows internals, and Git v2.47 is about to be released in a couple > days. This makes any potential fix quite risky as it would have to > happen deep down in our rename(3P) implementation in "compat/mingw.c". > > Let's instead work around both issues by disabling the test on MinGW > and by significantly increasing the locking timeout for Cygwin. This > bumped timeout also helps when running with e.g. the address and memory > sanitizers, which also tend to significantly extend the runtime of this > test. This doesn't work for me. > > This should be revisited after Git v2.47 is out. > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- [PATCH snipped] As I said last night, the first thing I tried was increasing the timeout ten-fold (ie, to 100000), but that didn't change the outcome. I am also on windows 10 (on an *old* 4th gen core i5): $ uname -a CYGWIN_NT-10.0-19045 satellite 3.5.4-1.x86_64 2024-08-25 16:52 UTC x86_64 Cygwin $ I don't know why we appear to be seeing different behaviour. Since I had already edited this test to up the timeout, I increased it again to match the value in this (redacted) patch (ie I didn't actually apply your patch): $ git diff diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 2d951c8ceb..7b6ded1c6a 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -458,7 +458,7 @@ test_expect_success 'ref transaction: many concurrent writers' ' # Set a high timeout such that a busy CI machine will not abort # early. 10 seconds should hopefully be ample of time to make # this non-flaky. - git config set reftable.lockTimeout 10000 && + git config set reftable.lockTimeout 300000 && test_commit --no-tag initial && head=$(git rev-parse HEAD) && $ However, apart from less debugging output, the result was essentially the same: ... ++ wait ++ git update-ref refs/heads/branch-100 HEAD fatal: update_ref failed for ref 'refs/heads/branch-97': reftable: transaction failure: I/O error fatal: update_ref failed for ref 'refs/heads/branch-87': reftable: transaction prepare: I/O error ++ git for-each-ref --sort=v:refname ++ test_cmp expect actual ++ test 2 -ne 2 ++ eval 'diff -u' '"$@"' +++ diff -u expect actual --- expect 2024-10-04 13:31:42.450969800 +0000 +++ actual 2024-10-04 13:32:07.097150600 +0000 @@ -84,7 +84,6 @@ 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-84 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-85 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-86 -68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-87 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-88 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-89 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-90 error: last command exited with $?=1 not ok 47 - ref transaction: many concurrent writers ... In an earlier email you suggested a timeout of -1, thus: $ git diff diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 2d951c8ceb..9ab37a8d69 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -458,7 +458,7 @@ test_expect_success 'ref transaction: many concurrent writers' ' # Set a high timeout such that a busy CI machine will not abort # early. 10 seconds should hopefully be ample of time to make # this non-flaky. - git config set reftable.lockTimeout 10000 && + git config set reftable.lockTimeout -1 && test_commit --no-tag initial && head=$(git rev-parse HEAD) && $ The result looks familiar: ++ wait ++ git update-ref refs/heads/branch-100 HEAD fatal: update_ref failed for ref 'refs/heads/branch-46': reftable: transaction prepare: I/O error ++ git for-each-ref --sort=v:refname ++ test_cmp expect actual ++ test 2 -ne 2 ++ eval 'diff -u' '"$@"' +++ diff -u expect actual --- expect 2024-10-04 14:35:20.159564800 +0000 +++ actual 2024-10-04 14:35:43.304762500 +0000 @@ -43,7 +43,6 @@ 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-43 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-44 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-45 -68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-46 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-47 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-48 68d032e9edd3481ac96382786ececc37ec28709e commit refs/heads/branch-49 error: last command exited with $?=1 not ok 47 - ref transaction: many concurrent writers Can you think of anything else to try? I would strongly suggest skipping this test on cygwin as well as MINGW. ATB, Ramsay Jones