On 8/9/2022 9:49 AM, Phillip Wood wrote: > Hi Stolee > > On 09/08/2022 14:12, Derrick Stolee via GitGitGadget wrote: >> This is the first of series towards building the bundle URI feature as >> discussed in previous RFCs, specifically pulled directly out of [5]: >> >> [1] >> https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@xxxxxxxxx/ >> >> [2] >> https://lore.kernel.org/git/cover-0.3-00000000000-20211025T211159Z-avarab@xxxxxxxxx/ >> >> [3] >> https://lore.kernel.org/git/pull.1160.git.1645641063.gitgitgadget@xxxxxxxxx >> >> [4] >> https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@xxxxxxxxx/ >> >> [5] >> https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@xxxxxxxxx >> >> THIS ONLY INCLUDES THE DESIGN DOCUMENT. See "Updates in v3". There are two >> patches: >> >> 1. The main design document that details the bundle URI standard and how >> the client interacts with the bundle data. >> 2. An addendum to the design document that details one strategy for >> organizing bundles from the perspective of a bundle provider. > > I thought the document was well written and left me with a good understanding > of both the problem being addressed and the rationale for the solution. Thanks for the kind words! > One small query - the document mentions CI farms as benefiting from this work > but my impression is that those commonly use shallow clones which are (quite > reasonably) not supported in this proposal. There are two different kinds of CI farms. The most common one is a SaaS CI system that provides machines on-demand, but each run starts from some "clean" state. For example, GitHub Actions runs CI builds of the Git project across a number of platforms. These machines need the source at HEAD, but do not need the full history. Further, they will erase the repository entirely at the end of the build, never fetching from those repositories. Thus, a shallow clone makes sense to minimize the data transfer. Bundles don't make sense here for multiple reasons, including that bundles must be closed under reachability and do not work for representing a shallow clone (see [1]). The other reason is that CI builds typically are triggered immediately after the commit appears on the origin Git server, so there is no time for a bundle provider to create a bundle representing that shallow clone. [1] https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/Documentation/technical/bundle-format.txt#L65-L69 The less common one is a private build farm. These machines are long-lived and controlled by the repository owner. They come pre-loaded with all of the software needed to build the repository. The best practice in this type of build farm is to keep a full clone of the repository in a well- known location and use incremental fetches to update the client repositories to download the commit necessary for the build. This type of build farm is typically self-hosted, but could also be hosted by a cloud provider. The bundle URI design allows ways to quickly bootstrap new build machines using a bundle provider (probably co-located with the build machines) as well as improving fetch times by creating frequent incremental bundles. The new commit being built is unlikely to exist immediately in the bundles, but it is unlikely to be too far ahead of any of the bundles. While private build farms are less common, they do become necessary for large projects. Engineering teams that have the resources to self-host a build farm are likely to also have the resources to self-host a bundle server. They may not have the connections or desire to advertise those bundle server URIs from the origin Git server. I hope this helps clarify my perspective as to why build farms using long- lived copies of the repository could take advantage of bundle URIs. Thanks, -Stolee