Re: [PATCH v4 0/2] bundle URIs: design doc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/9/2022 9:49 AM, Phillip Wood wrote:
> Hi Stolee
> 
> On 09/08/2022 14:12, Derrick Stolee via GitGitGadget wrote:
>> This is the first of series towards building the bundle URI feature as
>> discussed in previous RFCs, specifically pulled directly out of [5]:
>>
>> [1]
>> https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@xxxxxxxxx/
>>
>> [2]
>> https://lore.kernel.org/git/cover-0.3-00000000000-20211025T211159Z-avarab@xxxxxxxxx/
>>
>> [3]
>> https://lore.kernel.org/git/pull.1160.git.1645641063.gitgitgadget@xxxxxxxxx
>>
>> [4]
>> https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@xxxxxxxxx/
>>
>> [5]
>> https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@xxxxxxxxx
>>
>> THIS ONLY INCLUDES THE DESIGN DOCUMENT. See "Updates in v3". There are two
>> patches:
>>
>>   1. The main design document that details the bundle URI standard and how
>>      the client interacts with the bundle data.
>>   2. An addendum to the design document that details one strategy for
>>      organizing bundles from the perspective of a bundle provider.
> 
> I thought the document was well written and left me with a good understanding
> of both the problem being addressed and the rationale for the solution.

Thanks for the kind words!

> One small query - the document mentions CI farms as benefiting from this work
> but my impression is that those commonly use shallow clones which are (quite
> reasonably) not supported in this proposal.

There are two different kinds of CI farms.

The most common one is a SaaS CI system that provides machines on-demand,
but each run starts from some "clean" state. For example, GitHub Actions
runs CI builds of the Git project across a number of platforms. These
machines need the source at HEAD, but do not need the full history. Further,
they will erase the repository entirely at the end of the build, never
fetching from those repositories. Thus, a shallow clone makes sense to
minimize the data transfer. Bundles don't make sense here for multiple
reasons, including that bundles must be closed under reachability and do
not work for representing a shallow clone (see [1]). The other reason is
that CI builds typically are triggered immediately after the commit appears
on the origin Git server, so there is no time for a bundle provider to
create a bundle representing that shallow clone.

[1] https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/Documentation/technical/bundle-format.txt#L65-L69

The less common one is a private build farm. These machines are long-lived
and controlled by the repository owner. They come pre-loaded with all of
the software needed to build the repository. The best practice in this
type of build farm is to keep a full clone of the repository in a well-
known location and use incremental fetches to update the client repositories
to download the commit necessary for the build. This type of build farm is
typically self-hosted, but could also be hosted by a cloud provider. The
bundle URI design allows ways to quickly bootstrap new build machines using
a bundle provider (probably co-located with the build machines) as well as
improving fetch times by creating frequent incremental bundles. The new
commit being built is unlikely to exist immediately in the bundles, but it
is unlikely to be too far ahead of any of the bundles.

While private build farms are less common, they do become necessary for
large projects. Engineering teams that have the resources to self-host a
build farm are likely to also have the resources to self-host a bundle
server. They may not have the connections or desire to advertise those
bundle server URIs from the origin Git server.

I hope this helps clarify my perspective as to why build farms using long-
lived copies of the repository could take advantage of bundle URIs.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux