Re: [PATCH 0/2] Optionally support push options on up-to-date branches

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Fri, 15 Mar 2024 00:39:34 +0000

On 2024-03-14 at 23:29:40, Junio C Hamano wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> >> A radical counter-proposal for the design is to update the client
> >> side to do this unconditionally, without needing any new option.
> >
> > I'm not sure that would be a great idea.
> 
> Thanks.  I was looking for a push-back ;-)
> 
> > Since it's a push, that will
> > trigger authentication, which may prompt the user (e.g., for a password
> > or token or for a YubiKey touch with FIDO2 SSH) and which they might be
> > able to easily avoid.
> 
> Do you mean we first go unauthenticated to find out what commits are
> at the tip of refs at the destination repository, and then switch to
> authenticated push after we find out there is something that is
> worth pushing?  I somehow thought we need an authenticated access
> only for the initial ls-refs exchange.

It depends on the server and the protocol.  Protocol v0 allows
unauthenticated ref exchange, and then the push itself can be
authenticated.  It is customary to require authentication on both
pieces, but not mandatory.

> > As a server operator, I also expect that there
> > are people doing lots of needless attempts at pushing in automated
> > systems (because with enough users, there will be at least some who do
> > bizarre or inefficient things), and I would prefer to avoid serving
> > those requests if I don't need to.  (For example, for us, reference
> > updates need to go through a distributed commit protocol to update
> > multiple replicas of the repository, and if there's no ref updates, then
> > we cut out multiple services which we don't need to contact.)
> 
> Yes, but if you have an extra no-op ref update in the bunch, that
> one is excluded from the set of changes to be synchronised across
> replicas, no?

As I said, we can't do that, because we have to verify that the old
value is correct.  Even if we did perform that optimization, once we
know we have a write, we have to contact every replica (in case the user
does indeed send us a pack, which is allowed by the protocol, even if
bizarre), whereas with a ref request or a read, we can satisfy that with
a single read replica.  I'm sure I can satisfy at least one to two
orders of magnitude more no-op push attempts with the current
optimization than I could if we didn't have it.  It might even be to the
point of three or more orders of magnitude.  (My guess is that reftable
will improve this even further, but I haven't yet measured.)

I'm sure this is also true for most other implementations; serving
reads, like ls-refs, are almost always substantially cheaper than writes
and most server implementations are highly optimized for serving large
numbers of reads due to CI farms and such.

> > Note also that no-op ref updates cannot be simply omitted on the server
> > side because we need to verify that the old value for the ref is
> > correct, or we need to reject the operation as out of date.
> 
> Yes, but isn't that something the user would rather want to find out
> earlier rather than later?  Your push without no-op update may say
> "ah, we are up to date" when we are behind or worse diverged.  If we
> do the no-op update more often, we'd have more chance to catch such
> a situation where the worldview the end-user's repository has is out
> of sync with reality.

Well, we were up to date at the time the ls-refs operation completed, or
the push wouldn't have the same old and new OIDs.  The client has to
declare the old ref value that it got from the ls-refs output in order
for the operation to succeed, since the server-side Git won't update the
references if the old value doesn't match.  (This is how the client
knows whether a push is a fast-forward or not: it computes it, and if
the remote side has an unknown value, then it's not.)  Other than
updating the push options, there's no functional benefit to making a
second request, since the ls-refs output already told us where the
branch is.  That's why the current optimization is functionally correct.

In any event, there are alternate implementations of the Git protocol
which did not implement our current optimization and did, for purposes
of mirroring, make excessive numbers of no-op requests, and we did have
to ask for that to be fixed.  I assure you, the no-op behaviour is
better for users because it's faster and substantially more likely to
succeed (because it is effectively a read-only ls-refs request), whereas
a new ref update is less efficient, slower, and substantially more
likely to fail (due to functional limitations on write throughput).

I also expect there will be poorly written `pre-push` and `pre-receive`
hooks which will not like getting a no-op update and not handle it
gracefully.  I'm pretty sure Git LFS will handle this just fine, but
lots of people have various shell scripts that probably will not.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature