Re: [PATCH v2 12/27] serve: introduce git-serve

Duy Nguyen <pclouds@xxxxxxxxx> · Fri, 26 Jan 2018 17:39:51 +0700

On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams <bmwill@xxxxxxxxxx> wrote:
> + Detailed Design
> +=================
> +
> +A client can request to speak protocol v2 by sending `version=2` in the
> +side-channel `GIT_PROTOCOL` in the initial request to the server.
> +
> +In protocol v2 communication is command oriented.  When first contacting a
> +server a list of capabilities will advertised.  Some of these capabilities

s/will advertised/will be advertised/

> + Capability Advertisement
> +--------------------------
> +
> +A server which decides to communicate (based on a request from a client)
> +using protocol version 2, notifies the client by sending a version string
> +in its initial response followed by an advertisement of its capabilities.
> +Each capability is a key with an optional value.  Clients must ignore all
> +unknown keys.

With have a convention in $GIT_DIR/index file format that's probably a
good thing to follow here: lowercase keys are optional, such unknown
keys can (and must) be ignored. Uppercase keys are mandatory. If a
client can't understand one of those keys, abort. This gives the
server a way to "select" clients and introduce incompatible changes if
we ever have to.

> Semantics of unknown values are left to the definition of
> +each key.  Some capabilities will describe commands which can be requested
> +to be executed by the client.
> +
> +    capability-advertisement = protocol-version
> +                              capability-list
> +                              flush-pkt
> +
> +    protocol-version = PKT-LINE("version 2" LF)
> +    capability-list = *capability
> +    capability = PKT-LINE(key[=value] LF)
> +
> +    key = 1*CHAR
> +    value = 1*CHAR
> +    CHAR = 1*(ALPHA / DIGIT / "-" / "_")

Is this a bit too restricted for "value"? Something like "." (e.g.
version) or "@" (I wonder if anybody will add an capability that
contains an email address). Unless there's a good reason to limit it,
should we just go full ascii (without control codes)?

> +A client then responds to select the command it wants with any particular
> +capabilities or arguments.  There is then an optional section where the
> +client can provide any command specific parameters or queries.
> +
> +    command-request = command
> +                     capability-list
> +                     (command-args)
> +                     flush-pkt
> +    command = PKT-LINE("command=" key LF)
> +    command-args = delim-pkt
> +                  *arg
> +    arg = 1*CHAR
> +
> +The server will then check to ensure that the client's request is
> +comprised of a valid command as well as valid capabilities which were
> +advertised.  If the request is valid the server will then execute the
> +command.

What happens when the request is not valid? Or..

> +When a command has finished

How does the client know a command has finished? Is it up to each
command design?

More or less related it bugs me that I have a translated git client,
but I still receive remote error messages in English. It's a hard
problem, but I'm hoping that we won't need to change the core protocol
to support that someday. Although we could make rule now that side
channel message could be sent in "printf"-like form, where the client
can translate the format string and substitutes placeholders with real
values afterward...

> a client can either request that another
> +command be executed or can terminate the connection by sending an empty
> +request consisting of just a flush-pkt.
> +
> + Capabilities
> +~~~~~~~~~~~~~~
> +
> +There are two different types of capabilities: normal capabilities,
> +which can be used to to convey information or alter the behavior of a
> +request, and command capabilities, which are the core actions that a
> +client wants to perform (fetch, push, etc).
> +
> + agent
> +-------
> +
> +The server can advertise the `agent` capability with a value `X` (in the
> +form `agent=X`) to notify the client that the server is running version
> +`X`.  The client may optionally send its own agent string by including
> +the `agent` capability with a value `Y` (in the form `agent=Y`) in its
> +request to the server (but it MUST NOT do so if the server did not
> +advertise the agent capability). The `X` and `Y` strings may contain any
> +printable ASCII characters except space (i.e., the byte range 32 < x <
> +127), and are typically of the form "package/version" (e.g.,
> +"git/1.8.3.1"). The agent strings are purely informative for statistics
> +and debugging purposes, and MUST NOT be used to programmatically assume
> +the presence or absence of particular features.
> +
> + stateless-rpc
> +---------------
> +
> +If advertised, the `stateless-rpc` capability indicates that the server
> +supports running commands in a stateless-rpc mode, which means that a
> +command lasts for only a single request-response round.
> +
> +Normally a command can last for as many rounds as are required to
> +complete it (multiple for negotiation during fetch or no additional
> +trips in the case of ls-refs).  If the client sends the `stateless-rpc`
> +capability with a value of `true` (in the form `stateless-rpc=true`)
> +then the invoked command must only last a single round.

Speaking of stateless-rpc, I remember last time this topic was brought
up, there was some discussion to kind of optimize it for http as well,
to fit the "client sends request, server responds data" model and
avoid too many round trips (ideally everything happens in one round
trip). Does it evolve to anything real? All the cool stuff happened
while I was away, sorry if this was discussed and settled.
-- 
Duy