Re: [PATCH] t0610: work around flaky test with concurrent writers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

On 04/10/2024 13:16, Patrick Steinhardt wrote:
[snip]

> Now the two reports are somewhat different from one another:
> 
>   - On Cygwin we hit timeouts because we fail to lock the "tables.list"
>     file within 10 seconds. The renames themselves succeed even when the
>     target file is open because Cygwin provides extensive compatibility
>     logic to make them work even when the target file is open already.

Hmm, not so much for me! :(

> 
>   - On MinGW we hit I/O errors on rename. While we do have some retry
>     logic in place to make the rename work in some cases, this is
>     seemingly not sufficient when there is this much contention around
>     the files.

I am seeing I/O errors.

> 
> Neither of these cases is a regression: the logic didn't work before the
> mentioned commit, and after the commit it performs well on Linux, macOS
> and in Cygwin, and at least a bit better with MinGW. But the tests show
> that we need to put more thought into how to make this work properly on
> MinGW systems.

OK

> 
> The fact that Cygwin can work around this issue with better emulation of
> POSIX-style atomic renames shows that we can in theory make MinGW work
> better, as well. But doing so likely requires quite some fiddling with
> Windows internals, and Git v2.47 is about to be released in a couple
> days. This makes any potential fix quite risky as it would have to
> happen deep down in our rename(3P) implementation in "compat/mingw.c".
> 
> Let's instead work around both issues by disabling the test on MinGW
> and by significantly increasing the locking timeout for Cygwin. This
> bumped timeout also helps when running with e.g. the address and memory
> sanitizers, which also tend to significantly extend the runtime of this
> test.

This doesn't work for me.

> 
> This should be revisited after Git v2.47 is out.
> 
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---

[PATCH snipped]

As I said last night, the first thing I tried was increasing the timeout
ten-fold (ie, to 100000), but that didn't change the outcome.

I am also on windows 10 (on an *old* 4th gen core i5): 
 
  $ uname -a
  CYGWIN_NT-10.0-19045 satellite 3.5.4-1.x86_64 2024-08-25 16:52 UTC x86_64 Cygwin
  $ 

I don't know why we appear to be seeing different behaviour. Since I had
already edited this test to up the timeout, I increased it again to match
the value in this (redacted) patch (ie I didn't actually apply your patch):

  $ git diff
  diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
  index 2d951c8ceb..7b6ded1c6a 100755
  --- a/t/t0610-reftable-basics.sh
  +++ b/t/t0610-reftable-basics.sh
  @@ -458,7 +458,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
                  # Set a high timeout such that a busy CI machine will not abort
                  # early. 10 seconds should hopefully be ample of time to make
                  # this non-flaky.
  -               git config set reftable.lockTimeout 10000 &&
  +               git config set reftable.lockTimeout 300000 &&
                  test_commit --no-tag initial &&
   
                  head=$(git rev-parse HEAD) &&
  $ 

However, apart from less debugging output, the result was essentially the
same:

  ...
  
  ++ wait
  ++ git update-ref refs/heads/branch-100 HEAD
  fatal: update_ref failed for ref 'refs/heads/branch-97': reftable: transaction failure: I/O error
  fatal: update_ref failed for ref 'refs/heads/branch-87': reftable: transaction prepare: I/O error
  ++ git for-each-ref --sort=v:refname
  ++ test_cmp expect actual
  ++ test 2 -ne 2
  ++ eval 'diff -u' '"$@"'
  +++ diff -u expect actual
  --- expect      2024-10-04 13:31:42.450969800 +0000
  +++ actual      2024-10-04 13:32:07.097150600 +0000
  @@ -84,7 +84,6 @@
   68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-84
   68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-85
   68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-86
  -68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-87
   68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-88
   68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-89
   68d032e9edd3481ac96382786ececc37ec28709e commit        refs/heads/branch-90
  error: last command exited with $?=1
  not ok 47 - ref transaction: many concurrent writers
 
  ...

In an earlier email you suggested a timeout of -1, thus:

  $ git diff
  diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
  index 2d951c8ceb..9ab37a8d69 100755
  --- a/t/t0610-reftable-basics.sh
  +++ b/t/t0610-reftable-basics.sh
  @@ -458,7 +458,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
                  # Set a high timeout such that a busy CI machine will not abort
                  # early. 10 seconds should hopefully be ample of time to make
                  # this non-flaky.
  -               git config set reftable.lockTimeout 10000 &&
  +               git config set reftable.lockTimeout -1 &&
                  test_commit --no-tag initial &&
   
                  head=$(git rev-parse HEAD) &&
  $ 

The result looks familiar:
  
  ++ wait
  ++ git update-ref refs/heads/branch-100 HEAD
  fatal: update_ref failed for ref 'refs/heads/branch-46': reftable: transaction prepare: I/O error
  ++ git for-each-ref --sort=v:refname
  ++ test_cmp expect actual
  ++ test 2 -ne 2
  ++ eval 'diff -u' '"$@"'
  +++ diff -u expect actual
  --- expect	2024-10-04 14:35:20.159564800 +0000
  +++ actual	2024-10-04 14:35:43.304762500 +0000
  @@ -43,7 +43,6 @@
   68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-43
   68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-44
   68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-45
  -68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-46
   68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-47
   68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-48
   68d032e9edd3481ac96382786ececc37ec28709e commit	refs/heads/branch-49
  error: last command exited with $?=1
  not ok 47 - ref transaction: many concurrent writers

Can you think of anything else to try?
  
I would strongly suggest skipping this test on cygwin as well as MINGW.

ATB,
Ramsay Jones

 




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux