Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 24 Apr 2019 09:48:13 +0200

On Wed, Apr 24 2019, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Apr 24 2019, Jonathan Nieder wrote:
>>> Jeff King wrote:
>>>> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:
>
>>>>> +If the 'packfile-uris' feature is advertised, the following argument
>>>>> +can be included in the client's request as well as the potential
>>>>> +addition of the 'packfile-uris' section in the server's response as
>>>>> +explained below.
>>>>> +
>>>>> +    packfile-uris <comma-separated list of protocols>
>>>>> +	Indicates to the server that the client is willing to receive
>>>>> +	URIs of any of the given protocols in place of objects in the
>>>>> +	sent packfile. Before performing the connectivity check, the
>>>>> +	client should download from all given URIs. Currently, the
>>>>> +	protocols supported are "http" and "https".
>>>>
>>>> This negotiation seems backwards to me, because it puts too much power
>>>> in the hands of the server.
>>>
>>> Thanks.  Forgive me if this was covered earlier in the conversation, but
>>> why do we need more than one protocol at all here?  Can we restrict this
>>> to only-https, all the time?
>>
>> There was this in an earlier discussion about this:
>> https://public-inbox.org/git/877eds5fpl.fsf@xxxxxxxxxxxxxxxxxxx/
>>
>> It seems arbitrary to break it for new features if we support http in
>> general, especially with a design as it is now where the checksum of the
>> pack is transmitted out-of-band.
>
> Thanks for the pointer.  TLS provides privacy, too, but I can see why
> in today's world it might not always be easy to set it up, and given
> that we have integrity protection via that checksum, I can see why
> some people might have a legitimate need for using plain "http" here.
>
> We may also want to support packfile-uris using SSH protocol in the
> future.  Might as well figure out how the protocol negotiation works
> now.  So let's delve more into it:
>
> Peff mentioned that it feels backwards for the client to specify what
> protocols they support in the request, instead of the server
> specifying them upfront in the capability advertisement.  I'm inclined
> to agree: it's probably reasonable to put this in server capabilities
> instead.  That would even allow the client to do something like
>
> 	This server only supports HTTP without TLS, which you have
> 	indicated is a condition in which you want to be prompted.
> 	Proceed?
>
> 	[Use HTTP packfiles]  [Use slower but safer inline packs]
>
> Peff also asked whether protocol scheme is the right granularity:
> should the server list what domains they can serve packfiles from
> instead?  In other words, once you're doing it for protocol schemes,
> why not do it for whole URIs too?  I'm grateful for the question since
> it's a way to probe at design assumptions.
>
> - protocol schemes are likely to be low in number because each has its
>   own code path to handle it.  By comparison, domains or URIs may be
>   too numerous to be something we want to jam into the capability
>   advertisement.  (Or the server operator could always use the same
>   domain as the Git repo, and then use a 302 to redirect to the CDN.
>   I suspect this is likely to be a common setup anyway: it allows the
>   Git server to generate a short-lived signed URL that it uses as the
>   target of a 302.  But in this case, what is the point of a domain
>   whitelist?)
>
> - relatedly, because the list of protocol schemes is small, it is
>   feasible to test client behavior with each subset of protocol
>   schemes enabled.  Finer-grained filtering would mean more esoteric
>   client configurations for server operators to support and debug.
>
> - supported protocol schemes do not vary per request.  The actual
>   packfile URI is dynamic and varies per request
>
> - separately from questions of preference or security policy,
>   clients may have support for a limited subset of protocol schemes.
>   For example, imagine a stripped-down client without SSH support.
>   So we need a way to agree about this capability anyway.
>
> So I suspect that, at least to start, protocol scheme negotiation
> should be enough and we don't need full URI negotiation.
>
> There are a few escape valves:
>
> - affected clients can complain to the server operator, who will then
>   reconfigure the server to use more appropriate packfile URIs
>
> - if there is a need for different clients to use different packfile
>   URIs, clients can pass a flag, using --server-option, to the server
>   to help it choose.
>
> - a client can disable support for packfile URIs on a particular
>   request and fall back to inline packs.
>
> - if and when an affected client materializes, they can help us
>   improve the protocol to handle their needs.
>
> Sensible?

Food for thought: would we consider ssh->https a "downgrade"? I think
"maybe". We're going from whatever custom setting the user has
(e.g. manually approve new hosts) to the CA system.

But I think it would be fine to just only whitelist ssh->https and ban
everything else behind a very scary config option or something, we could
always fleshen out the semantics of upgrade/downgrade/switching later,
and it would IMO suck less than outright banning a protcol we otherwise
support in the design, and which (unlike git://) is something people are
still finding uses for in the wild for non-legacy reasons.