Re: [PATCH v2] t0610: work around flaky test with concurrent writers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

On 04/10/2024 16:32, Patrick Steinhardt wrote:
> In 6241ce2170 (refs/reftable: reload locked stack when preparing
> transaction, 2024-09-24) we have introduced a new test that exercises
> how the reftable backend behaves with many concurrent writers all racing
> with each other. This test was introduced after a couple of fixes in
> this context that should make concurrent writes behave gracefully. As it
> turns out though, Windows systems do not yet handle concurrent writes
> properly, as we've got two reports for Cygwin and MinGW failing in this
> newly added test.
> 
> The root cause of this is how we update the "tables.list" file: when
> writing a new stack of tables we first write the data into a lockfile
> and then rename that file into place. But Windows forbids us from doing
> that rename when the target path is open for reading by another process.
> And as the test races both readers and writers with each other we are
> quite likely to hit this edge case.
> 
> This is not a regression: the logic didn't work before the mentioned
> commit, and after the commit it performs well on Linux and macOS, and
> the situation on Windows should have at least improved a bit. But the
> test shows that we need to put more thought into how to make this work
> properly there.
> 
> Work around the issue by disabling the test on Windows for now. While at
> it, increase the locking timeout to address reported timeouts when using
> either the address or memory sanitizer, which also tend to significantly
> extend the runtime of this test.
> 
> This should be revisited after Git v2.47 is out.
> 
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  t/t0610-reftable-basics.sh | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 2d951c8ceb..babec7993e 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -450,15 +450,22 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
>  	)
>  '
>  
> -test_expect_success 'ref transaction: many concurrent writers' '
> +# This test fails most of the time on Windows systems. The root cause is
> +# that Windows does not allow us to rename the "tables.list.lock" file into
> +# place when "tables.list" is open for reading by a concurrent process.
> +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
>  	test_when_finished "rm -rf repo" &&
>  	git init repo &&
>  	(
>  		cd repo &&
> -		# Set a high timeout such that a busy CI machine will not abort
> -		# early. 10 seconds should hopefully be ample of time to make
> -		# this non-flaky.
> -		git config set reftable.lockTimeout 10000 &&
> +		# Set a high timeout. While a couple of seconds should be
> +		# plenty, using the address sanitizer will significantly slow
> +		# us down here. So we are aiming way higher than you would ever
> +		# think is necessary just to keep us from flaking. We could
> +		# also lock indefinitely by passing -1, but that could
> +		# potentially block CI jobs indefinitely if there was a bug
> +		# here.
> +		git config set reftable.lockTimeout 300000 &&
>  		test_commit --no-tag initial &&
>  
>  		head=$(git rev-parse HEAD) &&

I did apply this patch and (unsurprising) it now passes just fine! :)

ATB,
Ramsay Jones





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux