Re: [RFC] fetch: update refs in a single transaction

Jeff King <peff@xxxxxxxx> · Wed, 8 Dec 2021 16:13:48 -0500

On Wed, Dec 08, 2021 at 09:15:01AM +0100, Christian Couder wrote:

> > Note that this slightly changes semantics of git-fetch(1): if we hit any
> > unexpected errors like the reference update racing with another process,
> > then we'll now fail to update any references, too.
> 
> So that's one difference between the "partial-update" and the
> "non-atomic" modes, but what you say doesn't really make me confident
> that it's the only one.
> 
> Also are there people who are in cases where a lot of reference
> updates are racing, and where it's important that in such cases at
> least some ref updates succeed?
> 
> If yes, then maybe the 3 modes could be kept and options for
> "partial-update" and "non-atomic" could be added. "partial-update"
> could be the default in case the reftable backend is used, while
> "non-atomic" would still be the default when the files backend is
> used.

I think these three modes are hard to explain to users, because the
failures which trigger for partial-update versus atomic depend on how
things happen to be implemented. Just coming from a user's perspective,
we might expect a breakdown like:

  - problems like non-fast-forward are logical policy issues, and we'd
    reject those without failing the whole transaction (in
    partial-update mode)

  - a system error like write() failing should be rare, and abandons the
    whole transaction (in either mode)

But there are some confusing cases:

  - if somebody else takes the lock and updates the ref at the same
    time, then that is handled in the ref transaction code. And so even
    though it's closer to a logical policy issue, the patch here would
    fail the whole transaction

  - likewise name conflicts (writing "refs/foo" when "refs/foo/bar"
    exists or vice versa) are more of a logical issue, but are also
    handled by the transaction code.

So I think we really have to teach the ref transaction code the notion
of non-atomic transactions, or we'll end up leaking out all of those
implementation details in user-facing behavior. I.e., the ref code needs
to learn not to immediately abort if it fails one lockfile, but to
optionally keep going (if the caller specified a non-atomic flag, of
course).

For the files-backend code, I think system errors would naturally fall
out in the same code. Failing to write() or rename() is not much
different than failing to get the lock in the first place. So
"partial-update" and "non-atomic" behavior would end up the same anyway,
and we do not need to expose the third mode to the user.

For the reftable backend, syscalls are inherently covering all the refs
anyway (we either commit the whole reftable update or not). So likewise
there would be no different between partial-update and non-atomic modes
(but they would both be different from the files backend).

I suspect the surgery needed for the ref-transaction code to allow
non-atomic updates would be pretty big, though. It involves checking
every error case to make sure it is safe to continue rather than
aborting (and munging data structures to mark particular refs as
"failed, don't do anything further for this one").

So I dunno. All of my analysis assumes the breakdown of user
expectations I gave above is a reasonable one. There may be others. But
it seems like the behavior created by just this patch would be very hard
to explain, and subject to change based on implementation details.

-Peff