Re: [RFC/PATCH 0/3] protocol v2

"Philip Oakley" <philipoakley@xxxxxxx> · Sun, 1 Mar 2015 23:06:21 -0000

From: "Junio C Hamano" <gitster@xxxxxxxxx>
I earlier said:

So if we are going to discuss a new protocol, I'd prefer to see the
discussion without worrying too much about how to inter-operate
with the current vintage of Git. It is no longer an interesting 
problem,
as we know how to solve it with minimum risk. Instead, I'd like to
see us design the new protocol in such a way that it is in-line
upgradable without repeating our past mistakes.

And I am happy to see that people are interested in discussing the
design of new protocols.

But after seeing the patches Stefan sent out, I think we are risking
of losing sight of what we are trying to accomplish.  We do not want
something that is merely new.

That is why I wanted people to think about, discuss and agree on
what limitation of the current protocol has that are problematic
(limitations that are not problematic are not something we do not
need to address [*1*]), so that we can design the new thing without
reintroducing the same limitation.

To remind people, here is a reprint of the draft I sent out earlier
in $gmane/264000.

The current protocol has the following problems that limit us:

 - It is not easy to make it resumable, because we recompute every
   time.  This is especially problematic for the initial fetch aka
   "clone" as we will be talking about a large transfer [*1*].

 - The protocol extension has a fairly low length limit [*2*].

 - Because the protocol exchange starts by the server side
   advertising all its refs, even when the fetcher is interested in
   a single ref, the initial overhead is nontrivial, especially when
   you are doing a small incremental update.  The worst case is an
   auto-builder that polls every five minutes, even when there is no
   new commits to be fetched [*3*].

 - Because we recompute every time, taking into account of what the
   fetcher has, in addition to what the fetcher obtained earlier
   from us in order to reduce the transferred bytes, the payload for
   incremental updates become tailor-made for each fetch and cannot
   be easily reused [*4*].

I'd like to see a new protocol that lets us overcome the above
limitations (did I miss others? I am sure people can help here)
sometime this year.

Unfortunately, nobody seems to want to help us by responding to "did
I miss others?" RFH, here are a few more from me.

OK, maybe not exactly about protocol, but a possible option would be the 
ability to send the data as a bundle or multi-bundles; Or perhasps as an 
archive, zip, or tar.

Data can then be exchanged across an airgap or pigeon mail. The airgap 
scenario is likely a real case that's not directly prominent at the 
moment, just because it's not tha direct.

There has been discussion about servers having bundles available for 
clones, but with a multi-bundle, one could package up a large bundle 
(months) and an increment (weeks, and then days), before an final easy 
to pack last few hours. That would be a server work trade-off, and 
support a CDN view if needed.

If such an approach was reasonable would the protocol support it? etc.

Just a thought while reading...

- The semantics of the side-bands are unclear.

  - Is band #2 meant only for progress output (I think the current
    protocol handlers assume that and unconditionally squelch it
    under --quiet)?  Do we rather want a dedicated "progress" and
    "error message" sidebands instead?

  - Is band #2 meant for human consumption, or do we expect the
    other end to interpret and act on it?  If the former, would it
    make sense to send locale information from the client side and
    ask the server side to produce its output with _("message")?

- The semantics of packet_flush() is suboptimal, and this
  shortcoming seeps through to the protocol mapped to the
  smart-HTTP transport.

  Originally, packet_flush() was meant as "Here is an end of one
  logical section of what I am going to speak.", hinting that it
  might be a good idea for the underlying implementation to hold
  the packets up to that point in-core and then write(2) them all
  out (i.e. "flush") to the file descriptor only when we handle
  packet_flush().  It never meant "Now I am finished speaking for
  now and it is your turn to speak."

  But because HTTP is inherently a ping-pong protocol where the
  requestor at one point stops talking and lets the responder
  speak, the code to map our protocol to the smart HTTP transport
  made the packet_flush() boundary as "Now I am done talking, it is
  my turn to listen."

  We probably need two kinds of packet_flush().  When a requestor
  needs to say two or more logical groups of things before telling
  the other side "Now I am done talking; it is your turn.", we need
  some marker (i.e. the original meaning of packet_flush()) at the
  end of these logical groups.  And in order to be able to say "Now
  I am done saying everything I need to say at this point for you
  to respond to me.  It is your turn.", we need another kind of
  marker.

[Footnote]

*1* For example, if we were working off of "what mistakes do we want
to correct?" list, I do not think we would have seen "capabilities
have to be only on the first packet" or "lets allow new daemon to
read extra cruft at the end of the first request".  I do not think I
heard why it is a problem that the daemon cannot pass extra info to
invoked program in the first place.  There might be a valid reason,
but then that needs to be explained, understood and agreed upon and
should be part of an updated "what are we fixing?" list.

--
Philip 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html