Re: [RFC/PATCH 0/3] protocol v2

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 01 Mar 2015 00:41:54 -0800

I earlier said:

> So if we are going to discuss a new protocol, I'd prefer to see the
> discussion without worrying too much about how to inter-operate
> with the current vintage of Git. It is no longer an interesting problem,
> as we know how to solve it with minimum risk. Instead, I'd like to
> see us design the new protocol in such a way that it is in-line
> upgradable without repeating our past mistakes.

And I am happy to see that people are interested in discussing the
design of new protocols.

But after seeing the patches Stefan sent out, I think we are risking
of losing sight of what we are trying to accomplish.  We do not want
something that is merely new.

That is why I wanted people to think about, discuss and agree on
what limitation of the current protocol has that are problematic
(limitations that are not problematic are not something we do not
need to address [*1*]), so that we can design the new thing without
reintroducing the same limitation.

To remind people, here is a reprint of the draft I sent out earlier
in $gmane/264000.

> The current protocol has the following problems that limit us:
> 
>  - It is not easy to make it resumable, because we recompute every
>    time.  This is especially problematic for the initial fetch aka
>    "clone" as we will be talking about a large transfer [*1*].
> 
>  - The protocol extension has a fairly low length limit [*2*].
> 
>  - Because the protocol exchange starts by the server side
>    advertising all its refs, even when the fetcher is interested in
>    a single ref, the initial overhead is nontrivial, especially when
>    you are doing a small incremental update.  The worst case is an
>    auto-builder that polls every five minutes, even when there is no
>    new commits to be fetched [*3*].
> 
>  - Because we recompute every time, taking into account of what the
>    fetcher has, in addition to what the fetcher obtained earlier
>    from us in order to reduce the transferred bytes, the payload for
>    incremental updates become tailor-made for each fetch and cannot
>    be easily reused [*4*].
> 
> I'd like to see a new protocol that lets us overcome the above
> limitations (did I miss others? I am sure people can help here)
> sometime this year.

Unfortunately, nobody seems to want to help us by responding to "did
I miss others?" RFH, here are a few more from me.

 - The semantics of the side-bands are unclear.

   - Is band #2 meant only for progress output (I think the current
     protocol handlers assume that and unconditionally squelch it
     under --quiet)?  Do we rather want a dedicated "progress" and
     "error message" sidebands instead?

   - Is band #2 meant for human consumption, or do we expect the
     other end to interpret and act on it?  If the former, would it
     make sense to send locale information from the client side and
     ask the server side to produce its output with _("message")?

 - The semantics of packet_flush() is suboptimal, and this
   shortcoming seeps through to the protocol mapped to the
   smart-HTTP transport.

   Originally, packet_flush() was meant as "Here is an end of one
   logical section of what I am going to speak.", hinting that it
   might be a good idea for the underlying implementation to hold
   the packets up to that point in-core and then write(2) them all
   out (i.e. "flush") to the file descriptor only when we handle
   packet_flush().  It never meant "Now I am finished speaking for
   now and it is your turn to speak."

   But because HTTP is inherently a ping-pong protocol where the
   requestor at one point stops talking and lets the responder
   speak, the code to map our protocol to the smart HTTP transport
   made the packet_flush() boundary as "Now I am done talking, it is
   my turn to listen."

   We probably need two kinds of packet_flush().  When a requestor
   needs to say two or more logical groups of things before telling
   the other side "Now I am done talking; it is your turn.", we need
   some marker (i.e. the original meaning of packet_flush()) at the
   end of these logical groups.  And in order to be able to say "Now
   I am done saying everything I need to say at this point for you
   to respond to me.  It is your turn.", we need another kind of
   marker.

[Footnote]

*1* For example, if we were working off of "what mistakes do we want
to correct?" list, I do not think we would have seen "capabilities
have to be only on the first packet" or "lets allow new daemon to
read extra cruft at the end of the first request".  I do not think I
heard why it is a problem that the daemon cannot pass extra info to
invoked program in the first place.  There might be a valid reason,
but then that needs to be explained, understood and agreed upon and
should be part of an updated "what are we fixing?" list.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html