Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

Jonathan Nieder <jrnieder@xxxxxxxxx> · Tue, 23 Apr 2019 15:48:39 -0700

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Apr 24 2019, Jonathan Nieder wrote:
>> Jeff King wrote:
>>> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:

>>>> +If the 'packfile-uris' feature is advertised, the following argument
>>>> +can be included in the client's request as well as the potential
>>>> +addition of the 'packfile-uris' section in the server's response as
>>>> +explained below.
>>>> +
>>>> +    packfile-uris <comma-separated list of protocols>
>>>> +	Indicates to the server that the client is willing to receive
>>>> +	URIs of any of the given protocols in place of objects in the
>>>> +	sent packfile. Before performing the connectivity check, the
>>>> +	client should download from all given URIs. Currently, the
>>>> +	protocols supported are "http" and "https".
>>>
>>> This negotiation seems backwards to me, because it puts too much power
>>> in the hands of the server.
>>
>> Thanks.  Forgive me if this was covered earlier in the conversation, but
>> why do we need more than one protocol at all here?  Can we restrict this
>> to only-https, all the time?
>
> There was this in an earlier discussion about this:
> https://public-inbox.org/git/877eds5fpl.fsf@xxxxxxxxxxxxxxxxxxx/
>
> It seems arbitrary to break it for new features if we support http in
> general, especially with a design as it is now where the checksum of the
> pack is transmitted out-of-band.

Thanks for the pointer.  TLS provides privacy, too, but I can see why
in today's world it might not always be easy to set it up, and given
that we have integrity protection via that checksum, I can see why
some people might have a legitimate need for using plain "http" here.

We may also want to support packfile-uris using SSH protocol in the
future.  Might as well figure out how the protocol negotiation works
now.  So let's delve more into it:

Peff mentioned that it feels backwards for the client to specify what
protocols they support in the request, instead of the server
specifying them upfront in the capability advertisement.  I'm inclined
to agree: it's probably reasonable to put this in server capabilities
instead.  That would even allow the client to do something like

	This server only supports HTTP without TLS, which you have
	indicated is a condition in which you want to be prompted.
	Proceed?

	[Use HTTP packfiles]  [Use slower but safer inline packs]

Peff also asked whether protocol scheme is the right granularity:
should the server list what domains they can serve packfiles from
instead?  In other words, once you're doing it for protocol schemes,
why not do it for whole URIs too?  I'm grateful for the question since
it's a way to probe at design assumptions.

- protocol schemes are likely to be low in number because each has its
  own code path to handle it.  By comparison, domains or URIs may be
  too numerous to be something we want to jam into the capability
  advertisement.  (Or the server operator could always use the same
  domain as the Git repo, and then use a 302 to redirect to the CDN.
  I suspect this is likely to be a common setup anyway: it allows the
  Git server to generate a short-lived signed URL that it uses as the
  target of a 302.  But in this case, what is the point of a domain
  whitelist?)

- relatedly, because the list of protocol schemes is small, it is
  feasible to test client behavior with each subset of protocol
  schemes enabled.  Finer-grained filtering would mean more esoteric
  client configurations for server operators to support and debug.

- supported protocol schemes do not vary per request.  The actual
  packfile URI is dynamic and varies per request

- separately from questions of preference or security policy,
  clients may have support for a limited subset of protocol schemes.
  For example, imagine a stripped-down client without SSH support.
  So we need a way to agree about this capability anyway.

So I suspect that, at least to start, protocol scheme negotiation
should be enough and we don't need full URI negotiation.

There are a few escape valves:

- affected clients can complain to the server operator, who will then
  reconfigure the server to use more appropriate packfile URIs

- if there is a need for different clients to use different packfile
  URIs, clients can pass a flag, using --server-option, to the server
  to help it choose.

- a client can disable support for packfile URIs on a particular
  request and fall back to inline packs.

- if and when an affected client materializes, they can help us
  improve the protocol to handle their needs.

Sensible?

Thanks,
Jonathan