On Mon, Sep 30, 2024 at 03:19:04PM -0700, Josh Steadmon wrote: > On 2024.09.27 00:07, Jeff King wrote: > > On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote: > > > > > +test_expect_success 'ref transaction: many concurrent writers' ' > > > + test_when_finished "rm -rf repo" && > > > + git init repo && > > > + ( > > > + cd repo && > > > + # Set a high timeout such that a busy CI machine will not abort > > > + # early. 10 seconds should hopefully be ample of time to make > > > + # this non-flaky. > > > + git config set reftable.lockTimeout 10000 && > > > > I saw this test racily fail in the Windows CI build. The failure is as > > you might imagine, a few of the background update-ref invocations > > failed: > > > > fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error > > > > but of course we don't notice because they're backgrounded. And then the > > expected output is missing the branch-21 entry (and in my case, > > branch-64 suffered a similar fate). > > > > At first I thought we probably needed to bump the timeout (and EIO was > > just our way of passing that up the stack). But looking at the > > timestamps in the Actions log, the whole loop took less than 10ms to > > run. > > > > So could this be indicative of a real contention issue specific to > > Windows? I'm wondering if something like the old "you can't delete a > > file somebody else has open" restriction is biting us somehow. > > > > -Peff > > We're seeing repeated failures from this test case with ASan enabled. > Unfortunately, we've only been able to reproduce this on our > $DAYJOB-specific build system. I haven't been able to get it to fail > using just the upstream Makefile so far. I'll keep trying to find a way > to reproduce this. > > FWIW, we're not getting I/O errors, we see the following: > fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references > > We tried increasing the timeout in the test to 2 minutes (up from 10s), > but it didn't fix the failures. If this is causing problems for folks I'd say we can do the below change for now. It's of course only a stop-gap solution until I find the time to debug this, which should be later this week or early next week. Patrick diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh index 2d951c8ceb..ad7bb39b79 100755 --- a/t/t0610-reftable-basics.sh +++ b/t/t0610-reftable-basics.sh @@ -450,7 +450,7 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' ' ) ' -test_expect_success 'ref transaction: many concurrent writers' ' +test_expect_success !WINDOWS 'ref transaction: many concurrent writers' ' test_when_finished "rm -rf repo" && git init repo && (