Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 30, 2024 at 03:19:04PM -0700, Josh Steadmon wrote:
> On 2024.09.27 00:07, Jeff King wrote:
> > On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
> > 
> > > +test_expect_success 'ref transaction: many concurrent writers' '
> > > +	test_when_finished "rm -rf repo" &&
> > > +	git init repo &&
> > > +	(
> > > +		cd repo &&
> > > +		# Set a high timeout such that a busy CI machine will not abort
> > > +		# early. 10 seconds should hopefully be ample of time to make
> > > +		# this non-flaky.
> > > +		git config set reftable.lockTimeout 10000 &&
> > 
> > I saw this test racily fail in the Windows CI build. The failure is as
> > you might imagine, a few of the background update-ref invocations
> > failed:
> > 
> >   fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error
> > 
> > but of course we don't notice because they're backgrounded. And then the
> > expected output is missing the branch-21 entry (and in my case,
> > branch-64 suffered a similar fate).
> > 
> > At first I thought we probably needed to bump the timeout (and EIO was
> > just our way of passing that up the stack). But looking at the
> > timestamps in the Actions log, the whole loop took less than 10ms to
> > run.
> > 
> > So could this be indicative of a real contention issue specific to
> > Windows? I'm wondering if something like the old "you can't delete a
> > file somebody else has open" restriction is biting us somehow.
> > 
> > -Peff
> 
> We're seeing repeated failures from this test case with ASan enabled.
> Unfortunately, we've only been able to reproduce this on our
> $DAYJOB-specific build system. I haven't been able to get it to fail
> using just the upstream Makefile so far. I'll keep trying to find a way
> to reproduce this.
> 
> FWIW, we're not getting I/O errors, we see the following:
> fatal: update_ref failed for ref 'refs/heads/branch-20': cannot lock references
> 
> We tried increasing the timeout in the test to 2 minutes (up from 10s),
> but it didn't fix the failures.

If this is causing problems for folks I'd say we can do the below change
for now. It's of course only a stop-gap solution until I find the time
to debug this, which should be later this week or early next week.

Patrick

diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 2d951c8ceb..ad7bb39b79 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -450,7 +450,7 @@ test_expect_success 'ref transaction: retry acquiring tables.list lock' '
 	)
 '
 
-test_expect_success 'ref transaction: many concurrent writers' '
+test_expect_success !WINDOWS 'ref transaction: many concurrent writers' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	(




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux