Re: [PATCH v4 3/3] refs/reftable: reload locked stack when preparing transaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 27, 2024 at 12:07:52AM -0400, Jeff King wrote:
> On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote:
> 
> > +test_expect_success 'ref transaction: many concurrent writers' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		# Set a high timeout such that a busy CI machine will not abort
> > +		# early. 10 seconds should hopefully be ample of time to make
> > +		# this non-flaky.
> > +		git config set reftable.lockTimeout 10000 &&
> 
> I saw this test racily fail in the Windows CI build. The failure is as
> you might imagine, a few of the background update-ref invocations
> failed:
> 
>   fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error
> 
> but of course we don't notice because they're backgrounded. And then the
> expected output is missing the branch-21 entry (and in my case,
> branch-64 suffered a similar fate).
> 
> At first I thought we probably needed to bump the timeout (and EIO was
> just our way of passing that up the stack). But looking at the
> timestamps in the Actions log, the whole loop took less than 10ms to
> run.
> 
> So could this be indicative of a real contention issue specific to
> Windows? I'm wondering if something like the old "you can't delete a
> file somebody else has open" restriction is biting us somehow.

Thanks for letting me know!

I very much expect that this is the scenario that you mention, where we
try to delete a file that is still held open by another process. We're
trying to be mindful about this restriction is the reftable library by
not failing when a call to unlink(3P) fails for any of the tables, and
we do have logic in place to unlink them at a later point in time when
not referenced by the "tables.list" file. So none of the calls to unlink
are error-checked at all.

But there's one file that we _have_ to rewrite, and that is of course
the "tables.list" file itself. We never unlink the file though but only
rename the new instance into place. I think I recently discovered that
we have some problems here, because Windows seemed to allow this in some
scenarios but not in others.

In any case, I've already set up a Windows VM last week, so I'll
investigate the issue later this week.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux