Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 24 Apr 2019 00:18:55 +0200

On Tue, Apr 23 2019, Jonathan Tan wrote:

>> The problem I see is that the client doesn't get to vet the list of
>> URIs; it only gets to specify a protocol match. But there are many other
>> reasons it might want to reject a URI: we don't like the protocol, the
>> domain name is on a blacklist (or not on a whitelist), the domain name
>> can't resolve, we can't make a TCP connection to the server, we can't
>> successfully fetch the pack.
>>
>> You'll note that those rise in complexity and time as you go down the
>> list. I'm not sure where on that spectrum we'd want our clients to stop
>> vetting (and it may even depend on config). But I think we ought to
>> design the protocol to put the decision in the hands of the client so
>> that it _can_ make those choices itself.
>>
>> I.e., I think the conversation ought to be more like:
>>
>>   Server: I support packfile-uris X, Y, Z.
>>
>>   Client: Great. I'll use URIs X and Z.
>>
>>   Server: OK, here's your pack, minus any objects I know are in X and Z.
>>           I'll send you the objects from Y as normal.
>>
>> And then the client is free to pick and choose.
>
> One drawback I see is that the server needs to compute objects to be
> sent twice - once to generate the URIs and once after the client has
> informed the server which URIs it wants.
>
> If we expect some packfile-uris to not be usable sometimes (for any of
> the reasons you listed), this would be nice. The protocol in my patches
> support a rudimentary version of this (if any of the URIs don't work for
> any reason, just fetch again without advertising that we support
> packfile URIs) but the packfile received during the first patch is
> wasted.

This is really orthagonal to this series, but wouldn't a better
resumption strategy here be to walk the pack we just downloaded, run the
equivalent of 'commit-graph write' on it to figure out likely "tip"
commits, and use those in "have" lines to negotiate with the server the
next time around?

I've sometimes wished we optionally had that sort of fetch algorithm, in
particular now setting "alternates" on a freshly init-ed repo will do
the full initial fetch even though we have most/all of the objects in
the alternates now.

> So the tradeoff is: in the good case, your suggestion means that we make
> another fetch request, increasing the load on the server and taking more
> time. In the bad case, your suggestions means that we avoid sending a
> useless packfile upon the first patch, and if the server is smart
> enough, even the second packfile will be smaller. It depends on how
> often we think the bad case occurs, if servers will typically send more
> than one packfile-uri, and how smart we think servers will typically be.
>
>> The initial server uri
>> list can come in the capabilities list, or it can be a separate request
>> once the client sees the server supports packfile-uris and wants to ask
>> about them.
>
> I don't think this is possible - the URI list is dependent on the wants
> and haves.
>
>> We may need some way for the server to group the uris so
>> that the client knows which ones are alternates of each other (and which
>> ones are needed to make a complete set).
>
> My initial design didn't have this feature (in fact, it seems to me that
> each thing should have one canonical URL, which means that there is no
> need for an alternate) - do you think we should be thinking about this
> at this stage?