Re: [RFC] protocol version 2

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Fri, 10 Nov 2017 12:13:47 -0800

On Fri, 20 Oct 2017 10:18:39 -0700
Brandon Williams <bmwill@xxxxxxxxxx> wrote:

> Some of the pain points with the current protocol spec are:

After some in-office discussion, I think that the most important pain
point is that we have to implement each protocol twice: once for
HTTP(S), and once for SSH (and friends) that support bidirectional byte
streams.

If it weren't for this, I think that what is discussed in this document
(e.g. ls-refs, fetch-object) can be less invasively accomplished with
v1, specifying "extra parameters" (explained in this e-mail [1]) to
merely tweak the output of upload-pack instead of replacing it nearly
completely, thus acting more as optimizations than changing the mode of
operation entirely.

[1] https://public-inbox.org/git/20171010193956.168385-1-jonathantanmy@xxxxxxxxxx/

>   * The server's initial response is the ref advertisement.  This
>     advertisement cannot be omitted and can become an issue due to the
>     sheer number of refs that can be sent with large repositories.  For
>     example, when contacting the internal equivalent of
>     `https://android.googlesource.com/`, the server will send
>     approximately 1 million refs totaling 71MB.  This is data that is
>     sent during each and every fetch and is not scalable.

For me, this is not a compelling one, because we can provide a ref
whitelist as an "extra parameter" in v1.

>   * Capabilities were implemented as a hack and are hidden behind a NUL
>     byte after the first ref sent from the server during the ref
>     advertisement:
> 
> 	<SHA1> <Ref Name>\0<capabilities space separated> <symref> <agent>
> 
>     Since they are sent in the context of a pkt-line they are also subject
>     to the same length limitations (1k bytes with old clients).  While we
>     may not be close to hitting this limitation with capabilities alone, it
>     has become a problem when trying to abuse capabilities for other
>     purposes (e.g. [symrefs](https://public-inbox.org/git/20160816161838.klvjhhoxsftvkfmd@x/)).
> 
>   * Various other technical debt (e.g. abusing capabilities to
>     communicate agent and symref data, service name set using a query
>     parameter).

I think these 2 are the same - I would emphasize the fact that we cannot
add more stuff here, rather than the fact that we're putting this behind
NUL.

>  Special Packets
> -----------------
> 
> In protocol v2 these special packets will have the following semantics:
> 
>   * '0000' Flush Packet (flush-pkt) - indicates the end of a message
>   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list

To address the pain point of HTTP(S) being different from the others
(mentioned above), I think the packet semantics should be further
qualified:

 - Communications must be divided up into packets terminated by a
   flush-pkt. Also, each side must be implemented without knowing
   whether packets-in-progress can or cannot be seen by the other side.
 - Each request packet must have a corresponding, possibly empty,
   response packet.
 - A request packet may be sent even if a response packet corresponding
   to a previously sent request packet is awaited. (This allows us to
   retain the existing optimization in fetch-pack wherein, during
   negotiation, the "have" request-response packet pairs are
   interleaved.)

This will allow us to more easily share code between HTTP(S) and the
others.

In summary, I think that we need a big motivation to make the jump from
v1 to v2, instead of merely making small changes to v1 (and I do think
that the proposed new commands, such as "ls-refs" and "fetch-object",
can be implemented merely by small changes). And I think that the
ability to better share code between HTTP(S) and others provides that
motivation.