Re: [PATCH 4/4] Doc: push with --base

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 03 Nov 2020 09:35:39 -0800

Jeff King <peff@xxxxxxxx> writes:

> On Mon, Nov 02, 2020 at 09:35:54PM -0800, Jonathan Nieder wrote:
>
>> I think you're saying that we don't need a "push" v2 because v0
>> already has what a user would want.
>> 
>> Git protocol v2 for fetch brought two major changes:
>> 
>> - it changed the response for the initial request, allowing
>>   abbreviating the ref advertisement at last
>> 
>> - it defined a structure for requests and responses, simplifying the
>>   addition of later protocol improvements.  In particular, because the
>>   initial response is a capability advertisement, it allows changing
>>   the ref advertisement format more in the future.
>> 
>> Both of those changes would be valuable for push.  The ref
>> advertisements are large, and matching the structure of commands used
>> by fetchv2 would make debugging easier.
>> 
>> There are some specific applications I'm interested in after that
>> (e.g., pushing symrefs), but the fundamental extensibility improvement
>> is larger than any particular application I could think of.
>
> You pretty much summed up what I was going to respond. :)
>
> But I'd go further here...
>
>> That said, I'm not against experimenting with extra parameters before
>> we go there, as a way of getting more information about what a
>> workable negotiation for push looks like.
>
> I'd prefer to avoid doing this as an extra parameter for a few reasons:
>
>   - once it's in a released version, it's much harder for us to take it
>     away
>
>   - the extra parameters area is a hack that helped us bootstrap v2. We
>     could probably use the same hack to bootstrap v3, etc. But it has
>     limitations for stuffing in arbitrary data. An obvious one is size.
>     We can transmit a single base, but would be limited if we wanted to
>     be able to send multiple. We already ran into this once with the
>     "symref=foo:bar" capability overflowing pkt-line limits. Here I'm
>     not even sure what the limits might be (it's subject to things like
>     how big an HTTP header a proxy will pass, or how large an
>     environment variable an ssh implementation supports)
>
>   - it potentially pushes more data/work outside of the git protocol
>     itself. E.g., web servers have to translate Git-Protocol headers
>     into the GIT_PROTOCOL environment for v2. I guess this new field
>     works in our tests because we copy the header's value entirely in
>     our apache.conf. But I wonder how many systems in the wild may only
>     work if it contains "version=2".

I do not have much to add to what has been said so far, other than
offering historical perspective.

The single biggest reason why "fetch" has common ancestor discovery
negotiation and "push" does not is because the design comes from the
use case the inventor of Git and those worked on the early protocol
wanted to support---you are pushing into your own repository you
alone push into, your work is disseminated to others who fetch from
your repository, and you get others' work by fetching from theirs.

In such a world without a central server where everybody pushes
into, by definition, a pusher knows all the objects that have ever
been pushed into the receiving repository when running "git push".
They are all objects that passed through the repository you are
pushing from to the receiving repository in your previous pushes.
The advertised ref(s) are expected to be known to the repository you
are pusing from anyway, and if that is not the case, you would first
fetch from there before force-pushing.

Hence, when you push, there isn't much need to walk back from the
tip of refs at the remote to discover common ancestor like we do for
the fetch side in pre-central-server world.

On the other hand, you expect that remote refs point at objects
unknown to you when you fetch from your colleagues, so it is
expected that you have to perform the common ancestor discovery
negotiation.

After 15 years, we live in a different world.

People expect that a single repository at their hosting sites can be
used as the central meeting point for the project, just like CVS/SVN
servers were in older world.  "git push" would need to accept that
reality and start common ancestor discovery eventually.