Re: [PATCH v3 12/35] serve: introduce git-serve

Brandon Williams <bmwill@xxxxxxxxxx> · Fri, 23 Feb 2018 13:45:57 -0800

On 02/22, Jeff King wrote:
> On Tue, Feb 06, 2018 at 05:12:49PM -0800, Brandon Williams wrote:
> 
> > +In protocol v2 communication is command oriented.  When first contacting a
> > +server a list of capabilities will advertised.  Some of these capabilities
> > +will be commands which a client can request be executed.  Once a command
> > +has completed, a client can reuse the connection and request that other
> > +commands be executed.
> 
> If I understand this correctly, we'll potentially have a lot more
> round-trips between the client and server (one per "command"). And for
> git-over-http, each one will be its own HTTP request?
> 
> We've traditionally tried to minimize HTTP requests, but I guess it's
> not too bad if we can keep the connection open in most cases. Then we
> just suffer some extra framing bytes, but we don't have to re-establish
> the TCP connection each time.
> 
> I do wonder if the extra round trips will be noticeable in high-latency
> conditions. E.g., if I'm 200ms away, converting the current
> ref-advertisement spew to "capabilities, then the client asks for refs,
> then we spew the refs" is going to cost an extra 200ms, even if the
> fetch just ends up being a noop. I'm not sure how bad that is in the
> grand scheme of things (after all, the TCP handshake involves some
> round-trips, too).

I think this is the price of extending the protocol in a backward
compatible way.  If we don't want to be backwards compatible (allowing
for graceful fallback to v1) then we could design this differently.
Even so we're not completely out of luck just yet.

Back when I introduced the GIT_PROTOCOL side-channel I was able to
demonstrate that arbitrary data could be sent to the server and it would
only respect the stuff it knows about.  This means that we can do a
follow up to v2 at some point to introduce an optimization where we can
stuff a request into GIT_PROTOCOL and short-circuit the first round-trip
if the server supports it.

> 
> > + Capability Advertisement
> > +--------------------------
> > +
> > +A server which decides to communicate (based on a request from a client)
> > +using protocol version 2, notifies the client by sending a version string
> > +in its initial response followed by an advertisement of its capabilities.
> > +Each capability is a key with an optional value.  Clients must ignore all
> > +unknown keys.  Semantics of unknown values are left to the definition of
> > +each key.  Some capabilities will describe commands which can be requested
> > +to be executed by the client.
> > +
> > +    capability-advertisement = protocol-version
> > +			       capability-list
> > +			       flush-pkt
> > +
> > +    protocol-version = PKT-LINE("version 2" LF)
> > +    capability-list = *capability
> > +    capability = PKT-LINE(key[=value] LF)
> > +
> > +    key = 1*CHAR
> > +    value = 1*CHAR
> > +    CHAR = 1*(ALPHA / DIGIT / "-" / "_")
> > +
> > +A client then responds to select the command it wants with any particular
> > +capabilities or arguments.  There is then an optional section where the
> > +client can provide any command specific parameters or queries.
> > +
> > +    command-request = command
> > +		      capability-list
> > +		      (command-args)
> > +		      flush-pkt
> > +    command = PKT-LINE("command=" key LF)
> > +    command-args = delim-pkt
> > +		   *arg
> > +    arg = 1*CHAR
> 
> For a single stateful TCP connection like git:// or git-over-ssh, the
> client would get the capabilities once and then issue a series of
> commands. For git-over-http, how does it work?
> 
> The client speaks first in HTTP, so we'd first make a request to get
> just the capabilities from the server? And then proceed from there with
> a series of requests, assuming that the capabilities for each server we
> subsequently contact are the same? That's probably reasonable (and
> certainly the existing http protocol makes that capabilities
> assumption).
> 
> I don't see any documentation on how this all works with http. But

I can add in a bit for the initial request when using http, but the rest
of it should function the same.

> reading patch 34, it looks like we just do the usual
> service=git-upload-pack request (with the magic request for v2), and
> then the server would send us capabilities. Which follows my line of
> thinking in the paragraph above.

Yes this is exactly how it should work.  First we make an info/refs
request and if the server speaks v2 then instead of a refs request we
should get back a capability listing.  Then subsequent requests are made
assuming the capabilities are the same like we've done with the
existing protocol.

The great thing about this is that from the POV of the git-client, it
doesn't care if its speaking using the git://, ssh://, file://, or
http:// transport; it's all the same protocol.  In my next re-roll I'll
even drop the "# service" bit from the http server response and then the
responses will truly be identical in all cases.

-- 
Brandon Williams