Re: [PATCH v4 0/2] bundle URIs: design doc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stolee

On 09/08/2022 16:50, Derrick Stolee wrote:
One small query - the document mentions CI farms as benefiting from this work
but my impression is that those commonly use shallow clones which are (quite
reasonably) not supported in this proposal.

There are two different kinds of CI farms.

The most common one is a SaaS CI system that provides machines on-demand,
but each run starts from some "clean" state. For example, GitHub Actions
runs CI builds of the Git project across a number of platforms. These
machines need the source at HEAD, but do not need the full history. Further,
they will erase the repository entirely at the end of the build, never
fetching from those repositories. Thus, a shallow clone makes sense to
minimize the data transfer. Bundles don't make sense here for multiple
reasons, including that bundles must be closed under reachability and do
not work for representing a shallow clone (see [1]). The other reason is
that CI builds typically are triggered immediately after the commit appears
on the origin Git server, so there is no time for a bundle provider to
create a bundle representing that shallow clone.

[1] https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/Documentation/technical/bundle-format.txt#L65-L69

The less common one is a private build farm. These machines are long-lived
and controlled by the repository owner. They come pre-loaded with all of
the software needed to build the repository. The best practice in this
type of build farm is to keep a full clone of the repository in a well-
known location and use incremental fetches to update the client repositories
to download the commit necessary for the build. This type of build farm is
typically self-hosted, but could also be hosted by a cloud provider. The
bundle URI design allows ways to quickly bootstrap new build machines using
a bundle provider (probably co-located with the build machines) as well as
improving fetch times by creating frequent incremental bundles. The new
commit being built is unlikely to exist immediately in the bundles, but it
is unlikely to be too far ahead of any of the bundles.

While private build farms are less common, they do become necessary for
large projects. Engineering teams that have the resources to self-host a
build farm are likely to also have the resources to self-host a bundle
server. They may not have the connections or desire to advertise those
bundle server URIs from the origin Git server.

I hope this helps clarify my perspective as to why build farms using long-
lived copies of the repository could take advantage of bundle URIs.

Yes it does, thanks for taking the time to explain that.

Phillip

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux