Re: [RFC] fetch: update refs in a single transaction

Patrick Steinhardt <ps@xxxxxx> · Thu, 9 Dec 2021 08:11:34 +0100

On Wed, Dec 08, 2021 at 04:13:48PM -0500, Jeff King wrote:
> On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote:
> 
> > > Note that this slightly changes semantics of git-fetch(1): if we hit any
> > > unexpected errors like the reference update racing with another process,
> > > then we'll now fail to update any references, too.
> > 
> > So that's one difference between the "partial-update" and the
> > "non-atomic" modes, but what you say doesn't really make me confident
> > that it's the only one.
> > 
> > Also are there people who are in cases where a lot of reference
> > updates are racing, and where it's important that in such cases at
> > least some ref updates succeed?
> > 
> > If yes, then maybe the 3 modes could be kept and options for
> > "partial-update" and "non-atomic" could be added. "partial-update"
> > could be the default in case the reftable backend is used, while
> > "non-atomic" would still be the default when the files backend is
> > used.
> 
> I think these three modes are hard to explain to users, because the
> failures which trigger for partial-update versus atomic depend on how
> things happen to be implemented. Just coming from a user's perspective,
> we might expect a breakdown like:
> 
>   - problems like non-fast-forward are logical policy issues, and we'd
>     reject those without failing the whole transaction (in
>     partial-update mode)
> 
>   - a system error like write() failing should be rare, and abandons the
>     whole transaction (in either mode)
> 
> But there are some confusing cases:
> 
>   - if somebody else takes the lock and updates the ref at the same
>     time, then that is handled in the ref transaction code. And so even
>     though it's closer to a logical policy issue, the patch here would
>     fail the whole transaction
> 
>   - likewise name conflicts (writing "refs/foo" when "refs/foo/bar"
>     exists or vice versa) are more of a logical issue, but are also
>     handled by the transaction code.
> 
> So I think we really have to teach the ref transaction code the notion
> of non-atomic transactions, or we'll end up leaking out all of those
> implementation details in user-facing behavior. I.e., the ref code needs
> to learn not to immediately abort if it fails one lockfile, but to
> optionally keep going (if the caller specified a non-atomic flag, of
> course).

That would probably be the most flexible approach for the future, too.
There's going to be several locations which could benefit from such a
change, where we do want to get the performance benefits of using a
single transaction while continuing to exhibit the same behaviour in
edge cases where only a subset of refs could be updated.

For what it's worth, I think that such a new mode should likely only
kick in when "preparing" a transaction: this is the last step before
"committing" it and would thus be the only place where the caller has a
chance to introspect what really has been queued up and what had to be
dropped from the transaction due to races or whatnot. As long as callers
get a list of all dropped refs, including why they have been dropped,
they can also still communicate this information to the user.

> For the files-backend code, I think system errors would naturally fall
> out in the same code. Failing to write() or rename() is not much
> different than failing to get the lock in the first place. So
> "partial-update" and "non-atomic" behavior would end up the same anyway,
> and we do not need to expose the third mode to the user.

I think I disagree here. Failing to write() to me is quite different
from failing to take a lock: the first one is an unexpected system-level
error and brings us into a situation where we ain't got no clue why it
happened. The second one is a logical error that is entirely expected
given that lockfiles are explicitly designed for this failure mode, so
we know why they happen. With this in mind, I'd argue that we should
only continue with the transaction in the latter case, and abort on
unexpected system-level errors.

> For the reftable backend, syscalls are inherently covering all the refs
> anyway (we either commit the whole reftable update or not). So likewise
> there would be no different between partial-update and non-atomic modes
> (but they would both be different from the files backend).
> 
> I suspect the surgery needed for the ref-transaction code to allow
> non-atomic updates would be pretty big, though. It involves checking
> every error case to make sure it is safe to continue rather than
> aborting (and munging data structures to mark particular refs as
> "failed, don't do anything further for this one").

I hope that it's not going to be that bad if we restrict it to the
"prepare" phase, but that may just be wishful thinking.

> So I dunno. All of my analysis assumes the breakdown of user
> expectations I gave above is a reasonable one. There may be others. But
> it seems like the behavior created by just this patch would be very hard
> to explain, and subject to change based on implementation details.
> 
> -Peff

I think your analysis makes sense.

While I think that the three modes aren't as bad for a command like
git-fetch(1), the "single-tx-partial-update" mode would really only be a
special case of our current default, except that we use a single
transaction, only. Users shouldn't really need to care about this, and
we should do the right thing by default. The discussion mostly came to
live because our current implementation of reference transactions is
lacking the ability to handle this partial-update mode well (and it
didn't really have to until now), but pushing this technical limitation
onto the user is probably nothing we should do.

In the end I agree that we ought to extend the reftx mechanism to
support this usecase. While it's more work up-front, I expect that other
commands could benefit in a similar way without having to add
`--partial-atomic-reference-updates` switches to all of them.

Patrick
Attachment:
signature.asc

Description: PGP signature