On Fri, Sep 27, 2024 at 12:07:52AM -0400, Jeff King wrote: > On Tue, Sep 24, 2024 at 07:33:08AM +0200, Patrick Steinhardt wrote: > > > +test_expect_success 'ref transaction: many concurrent writers' ' > > + test_when_finished "rm -rf repo" && > > + git init repo && > > + ( > > + cd repo && > > + # Set a high timeout such that a busy CI machine will not abort > > + # early. 10 seconds should hopefully be ample of time to make > > + # this non-flaky. > > + git config set reftable.lockTimeout 10000 && > > I saw this test racily fail in the Windows CI build. The failure is as > you might imagine, a few of the background update-ref invocations > failed: > > fatal: update_ref failed for ref 'refs/heads/branch-21': reftable: transaction failure: I/O error > > but of course we don't notice because they're backgrounded. And then the > expected output is missing the branch-21 entry (and in my case, > branch-64 suffered a similar fate). > > At first I thought we probably needed to bump the timeout (and EIO was > just our way of passing that up the stack). But looking at the > timestamps in the Actions log, the whole loop took less than 10ms to > run. > > So could this be indicative of a real contention issue specific to > Windows? I'm wondering if something like the old "you can't delete a > file somebody else has open" restriction is biting us somehow. Thanks for letting me know! I very much expect that this is the scenario that you mention, where we try to delete a file that is still held open by another process. We're trying to be mindful about this restriction is the reftable library by not failing when a call to unlink(3P) fails for any of the tables, and we do have logic in place to unlink them at a later point in time when not referenced by the "tables.list" file. So none of the calls to unlink are error-checked at all. But there's one file that we _have_ to rewrite, and that is of course the "tables.list" file itself. We never unlink the file though but only rename the new instance into place. I think I recently discovered that we have some problems here, because Windows seemed to allow this in some scenarios but not in others. In any case, I've already set up a Windows VM last week, so I'll investigate the issue later this week. Patrick