[PATCH v2] t0610: work around flaky test with concurrent writers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In 6241ce2170 (refs/reftable: reload locked stack when preparing
transaction, 2024-09-24) we have introduced a new test that exercises
how the reftable backend behaves with many concurrent writers all racing
with each other. This test was introduced after a couple of fixes in
this context that should make concurrent writes behave gracefully. As it
turns out though, Windows systems do not yet handle concurrent writes
properly, as we've got two reports for Cygwin and MinGW failing in this
newly added test.

The root cause of this is how we update the "tables.list" file: when
writing a new stack of tables we first write the data into a lockfile
and then rename that file into place. But Windows forbids us from doing
that rename when the target path is open for reading by another process.
And as the test races both readers and writers with each other we are
quite likely to hit this edge case.

This is not a regression: the logic didn't work before the mentioned
commit, and after the commit it performs well on Linux and macOS, and
the situation on Windows should have at least improved a bit. But the
test shows that we need to put more thought into how to make this work
properly there.

Work around the issue by disabling the test on Windows for now. While at
it, increase the locking timeout to address reported timeouts when using
either the address or memory sanitizer, which also tend to significantly
extend the runtime of this test.

This should be revisited after Git v2.47 is out.

Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
---
 t/t0610-reftable-basics.sh | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 2d951c8ceb..babec7993e 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,15 +450,22 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
-test_expect_success 'ref transaction: many concurrent writers' '
+# This test fails most of the time on Windows systems. The root cause is
+# that Windows does not allow us to rename the "tables.list.lock" file into
+# place when "tables.list" is open for reading by a concurrent process.
+test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(
 		cd repo &&
-		# Set a high timeout such that a busy CI machine will not abort
-		# early. 10 seconds should hopefully be ample of time to make
-		# this non-flaky.
-		git config set reftable.lockTimeout 10000 &&
+		# Set a high timeout. While a couple of seconds should be
+		# plenty, using the address sanitizer will significantly slow
+		# us down here. So we are aiming way higher than you would ever
+		# think is necessary just to keep us from flaking. We could
+		# also lock indefinitely by passing -1, but that could
+		# potentially block CI jobs indefinitely if there was a bug
+		# here.
+		git config set reftable.lockTimeout 300000 &&
 		test_commit --no-tag initial &&
 
 		head=$(git rev-parse HEAD) &&

Interdiff against v1:
  diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
  index 86a746aff0..babec7993e 100755
  --- a/t/t0610-reftable-basics.sh
  +++ b/t/t0610-reftable-basics.sh
  @@ -453,23 +453,18 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
   # This test fails most of the time on Windows systems. The root cause is
   # that Windows does not allow us to rename the "tables.list.lock" file into
   # place when "tables.list" is open for reading by a concurrent process.
  -#
  -# The same issue does not happen on Cygwin because its implementation of
  -# rename(3P) is emulating POSIX-style renames, including renames over files
  -# that are open.
  -test_expect_success !MINGW 'ref transaction: many concurrent writers' '
  +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
   	test_when_finished "rm -rf repo" &&
   	git init repo &&
   	(
   		cd repo &&
   		# Set a high timeout. While a couple of seconds should be
   		# plenty, using the address sanitizer will significantly slow
  -		# us down here. Furthermore, Cygwin is also way slower due to
  -		# the POSIX-style rename emulation. So we are aiming way higher
  -		# than you would ever think is necessary just to keep us from
  -		# flaking. We could also lock indefinitely by passing -1, but
  -		# that could potentially block CI jobs indefinitely if there
  -		# was a bug here.
  +		# us down here. So we are aiming way higher than you would ever
  +		# think is necessary just to keep us from flaking. We could
  +		# also lock indefinitely by passing -1, but that could
  +		# potentially block CI jobs indefinitely if there was a bug
  +		# here.
   		git config set reftable.lockTimeout 300000 &&
   		test_commit --no-tag initial &&
   

base-commit: 111e864d69c84284441b083966c2065c2e9a4e78
-- 
2.47.0.rc0.dirty





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux