Hi Patrick, On 04/10/2024 16:32, Patrick Steinhardt wrote: > In 6241ce2170 (refs/reftable: reload locked stack when preparing > transaction, 2024-09-24) we have introduced a new test that exercises > how the reftable backend behaves with many concurrent writers all racing > with each other. This test was introduced after a couple of fixes in > this context that should make concurrent writes behave gracefully. As it > turns out though, Windows systems do not yet handle concurrent writes > properly, as we've got two reports for Cygwin and MinGW failing in this > newly added test. > > The root cause of this is how we update the "tables.list" file: when > writing a new stack of tables we first write the data into a lockfile > and then rename that file into place. But Windows forbids us from doing > that rename when the target path is open for reading by another process. > And as the test races both readers and writers with each other we are > quite likely to hit this edge case. > > This is not a regression: the logic didn't work before the mentioned > commit, and after the commit it performs well on Linux and macOS, and > the situation on Windows should have at least improved a bit. But the > test shows that we need to put more thought into how to make this work > properly there. > > Work around the issue by disabling the test on Windows for now. While at > it, increase the locking timeout to address reported timeouts when using > either the address or memory sanitizer, which also tend to significantly > extend the runtime of this test. > > This should be revisited after Git v2.47 is out. > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > t/t0610-reftable-basics.sh | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 2d951c8ceb..babec7993e 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -450,15 +450,22 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' ' > ) > ' > > -test_expect_success 'ref transaction: many concurrent writers' ' > +# This test fails most of the time on Windows systems. The root cause is > +# that Windows does not allow us to rename the "tables.list.lock" file into > +# place when "tables.list" is open for reading by a concurrent process. > +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' ' > test_when_finished "rm -rf repo" && > git init repo && > ( > cd repo && > - # Set a high timeout such that a busy CI machine will not abort > - # early. 10 seconds should hopefully be ample of time to make > - # this non-flaky. > - git config set reftable.lockTimeout 10000 && > + # Set a high timeout. While a couple of seconds should be > + # plenty, using the address sanitizer will significantly slow > + # us down here. So we are aiming way higher than you would ever > + # think is necessary just to keep us from flaking. We could > + # also lock indefinitely by passing -1, but that could > + # potentially block CI jobs indefinitely if there was a bug > + # here. > + git config set reftable.lockTimeout 300000 && > test_commit --no-tag initial && > > head=$(git rev-parse HEAD) && I did apply this patch and (unsurprising) it now passes just fine! :) ATB, Ramsay Jones