In 6241ce2170 (refs/reftable: reload locked stack when preparing transaction, 2024-09-24) we have introduced a new test that exercises how the reftable backend behaves with many concurrent writers all racing with each other. This test was introduced after a couple of fixes in this context that should make concurrent writes behave gracefully. As it turns out though, Windows systems do not yet handle concurrent writes properly, as we've got two reports for Cygwin and MinGW failing in this newly added test. The root cause of this is how we update the "tables.list" file: when writing a new stack of tables we first write the data into a lockfile and then rename that file into place. But Windows forbids us from doing that rename when the target path is open for reading by another process. And as the test races both readers and writers with each other we are quite likely to hit this edge case. This is not a regression: the logic didn't work before the mentioned commit, and after the commit it performs well on Linux and macOS, and the situation on Windows should have at least improved a bit. But the test shows that we need to put more thought into how to make this work properly there. Work around the issue by disabling the test on Windows for now. While at it, increase the locking timeout to address reported timeouts when using either the address or memory sanitizer, which also tend to significantly extend the runtime of this test. This should be revisited after Git v2.47 is out. Signed-off-by: Patrick Steinhardt <ps@xxxxxx> --- t/t0610-reftable-basics.sh | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 2d951c8ceb..babec7993e 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -450,15 +450,22 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' ' ) ' -test_expect_success 'ref transaction: many concurrent writers' ' +# This test fails most of the time on Windows systems. The root cause is +# that Windows does not allow us to rename the "tables.list.lock" file into +# place when "tables.list" is open for reading by a concurrent process. +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' ' test_when_finished "rm -rf repo" && git init repo && ( cd repo && - # Set a high timeout such that a busy CI machine will not abort - # early. 10 seconds should hopefully be ample of time to make - # this non-flaky. - git config set reftable.lockTimeout 10000 && + # Set a high timeout. While a couple of seconds should be + # plenty, using the address sanitizer will significantly slow + # us down here. So we are aiming way higher than you would ever + # think is necessary just to keep us from flaking. We could + # also lock indefinitely by passing -1, but that could + # potentially block CI jobs indefinitely if there was a bug + # here. + git config set reftable.lockTimeout 300000 && test_commit --no-tag initial && head=$(git rev-parse HEAD) && Interdiff against v1: diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 86a746aff0..babec7993e 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -453,23 +453,18 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' ' # This test fails most of the time on Windows systems. The root cause is # that Windows does not allow us to rename the "tables.list.lock" file into # place when "tables.list" is open for reading by a concurrent process. -# -# The same issue does not happen on Cygwin because its implementation of -# rename(3P) is emulating POSIX-style renames, including renames over files -# that are open. -test_expect_success !MINGW 'ref transaction: many concurrent writers' ' +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' ' test_when_finished "rm -rf repo" && git init repo && ( cd repo && # Set a high timeout. While a couple of seconds should be # plenty, using the address sanitizer will significantly slow - # us down here. Furthermore, Cygwin is also way slower due to - # the POSIX-style rename emulation. So we are aiming way higher - # than you would ever think is necessary just to keep us from - # flaking. We could also lock indefinitely by passing -1, but - # that could potentially block CI jobs indefinitely if there - # was a bug here. + # us down here. So we are aiming way higher than you would ever + # think is necessary just to keep us from flaking. We could + # also lock indefinitely by passing -1, but that could + # potentially block CI jobs indefinitely if there was a bug + # here. git config set reftable.lockTimeout 300000 && test_commit --no-tag initial && base-commit: 111e864d69c84284441b083966c2065c2e9a4e78 -- 2.47.0.rc0.dirty