Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/23/2021 9:28 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Aug 10 2021, Derrick Stolee wrote:
> 
>> On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep

(I'll skip the parts where you agree with me.)

>> 3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
>>    how it could be used to help "fetch" scenarios. To be fair, I also have
>>    been vocal about how the packfile-uri feature is very hard to make
>>    helpful for the "fetch" scenario. The benefits to "clone" seem to be
>>    worth the effort alone. I think the bundle-api doubles-down on that
>>    being the core functionality that matters, and that is fine by me. It
>>    sacrifices the flexibility of the packfile-uri with a lower maintenance
>>    burden for servers that want to implement it.
> 
> Not correct, I'm going to have it handle incremental fetches.
> 
> What *is* correct is that the initial RFC patches here entirely punt on
> it (I only hook them into clone.c, no fetch.c).
> 
> I focused on the "clone" case to get to a MVP earlier, and as you note I
> think in practice most users of this will want to only use it for clone
> bootstrapping, and won't care so much about the incremental case.
> 
> But for the "fetch" case we'd simply start fetching the N bundles in
> parallel, and read their headers, then abort fetching any whose header
> declares tips that we already have.

To save network time, we would need to read data as it is streaming
from the network and stop the download when we realize we don't need
the bundle? This seems wasteful, but I'm willing to see it work in
reality before judging it too harshly.

It would be better if the origin Git server could see the tips that
the user has and then only recommend bundles that serve new data.
That requires a closer awareness of which bundles contain which
content.

> So e.g. for a repo with a "big" bundle that's 1 week old, and 7
> incremental bundles for the last 7 days of updates we'd quickly find
> that we only need the last 3 if we updated 4 days ago, based on only
> fetching their headers.

I can attest that using time-based packs can be a good way to catch
up based on pre-computed pack-files instead of generating one on
demand. The GVFS protocol has a "prefetch" endpoint that gives all
pre-computed pack-files since a given timestamp. (One big difference
is that those pack-files contain only commits and trees, not blobs.
We should ensure that your proposal can extend to bundles that serve
filtered repos such as --filter=blob:none.)

The thing that works especially well with the GVFS protocol is that
if we are missing a reachable object from one of those pack-files,
then it is downloaded as a one-off request _when needed_. This allows
the pack-files to be computed by independent cache servers that have
their own clocks and compute their own timestamps and generate these
pack-files as "new objects I fetched since the last timestamp".

This is all to say that your timestamp-based approach will work as
long as you are very careful in how the incremental bundles are
constructed. There can really only be one list of bundles at a time,
and somehow we need those servers to atomically swap their list of
bundles when they are reorganized.

> The client spec is very liberal on "MAY" instead of "MUST" for
> supporting clients. E.g. a client might sensibly conclude that their
> most recent commits are recent enough, and that it would probably be a
> waste of time to fetch the headers of the pointed-to bundles, and just
> do a normal fetch instead.

Sensible. I wouldn't expect anything less.

> The response format is also currently:
> 
>     <URI>[ <optional key-value>*]
> 
> And nothing uses the "optional key-value". One of the main things I had
> in mind was to support something like (none of these key-values are
> specified currently):
> 
>     https://example.com/big.bundle tip=deadbeef1
>     https://example.com/incremental-1.bundle prereq=deadbeef1 tip=deadbeef2
>     https://example.com/incremental-2.bundle prereq=deadbeef2 tip=deadbeef3
> 
> Or:
> 
>     https://example.com/big.bundle type=complete
>     https://example.com/incremental-1.bundle type=incremental
>     https://example.com/incremental-2.bundle type=incremental
> 
> I.e. allow the server to up-front to optionally specify some details
> about the pointed-to-bundles, purely so the client can avoid the work
> and roundtrip of trying to fetch the headers of N pointed-to bundles
> just to see which if any it needs for a clone/fetch.

Ok, this seems like a reasonable way forward. If the bundles are
truly focused on data within a core "trunk" (say, the reflog of
the default branch) then this works. It gets more complicated for
repos with many long-lived branches that merge together over the
span of weeks. I'm talking about extreme outliers like the Windows
OS repository, so take this concern with a grain of salt. The
point is that in that world, a single tip isn't enough sometimes.

...
>> One question I saw Jonathan ask was "can we modify the packfile-uri
>> capability to be better?" I think this feature has enough different goals
>> that they could co-exist. That's the point of protocol v2, right? Servers
>> can implement and advertise the subset of functionality that they think is
>> best for their needs.
> 
> *Nod*, I also think as shown with these patches the overall complexity
> after setting up the scaffolding for a new feature isn't very high, and
> much of it is going away in some generally useful prep patches I'm
> trying to get in...
> 
>> I hope my ramblings provide some amount of clarity to the discussion, but
>> also I intend to show support of the idea. If I was given the choice of
>> which feature to support (I mostly work on the client experience, so take
>> my choice with a grain of salt), then I would focus on implementing the
>> bundle-uri capability _before_ the packfile-uri capability. And that's the
>> best part: more options present more flexibility for different hosts to
>> make different decisions.
> 
> Thanks again. I'll keep you CC'd on future submissions if that's OK.

Sounds good. I'll try to prioritize review.

> One thing I could really use to move this forward is code review of some
> of the outstanding serieses I've got, in particular these two are
> immediately applicable to this one. It's based on the former, and I
> needed to steal one patch from the latter:
> 
> https://lore.kernel.org/git/cover-v4-00.10-00000000000-20210805T011823Z-avarab@xxxxxxxxx/
> https://lore.kernel.org/git/cover-v2-0.4-00000000000-20210823T110136Z-avarab@xxxxxxxxx/

I'll see what I can contribute here.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux