As noted in a preceding commit we really should be using libcurl's C API by default in get_bundle_uri(), but testing with a command-line program can be very handy, and useful e.g. to implement custom or ad-hoc caching. E.g. using part of the recipe noted in a preceding commit to create the "git-master-only.bdl" and "git-master-to-next.bdl" files, we can implement a trivial caching shellscript as: cat >get-bundle.sh <<-\EOF && #!/bin/sh set -xe uri="$1" bundle_cache_key () { echo "Computing cache key for URI '$1' (only getting the header)" >&2 curl --silent --output - -- "$1" | sed -n -e '/^$/q' -e 'p' | git hash-object --stdin } get_cached_bundle_uri() { cache_key=$(bundle_cache_key "$1") path="/tmp/bundle-cache-$cache_key.bdl" if test -e "$path" then echo "Using cache '$path' for URI '$1'" >&2 cat "$path" else echo "Downloading bundle URI $1" >&2 curl --silent --output - -- "$uri" | tee "$path" fi } get_cached_bundle_uri "$1" EOF chmod +x get-bundle.sh && rm -rf /tmp/git.git && ./git \ -c protocol.version=2 \ -c fetch.uriProtocols=file \ -c transfer.bundleURI.downloader=./get-bundle.sh \ -c transfer.injectBundleURI="file:///tmp/git-master-only.bdl" \ -c transfer.injectBundleURI="file:///tmp/git-master-to-next.bdl" \ clone --bare --no-tags --single-branch --branch next --template= \ --verbose --verbose \ https://github.com/git/git.git /tmp/git.git Now, clearly that specific example is rather pointless. We're getting a local file anyway, so "cat"-ing another local file doesn't make any difference, it's even slightly slower & more redundant as we're having to get it twice with "curl". But the point is that this can be trivially improved for use in any arbitrary custom caching strategy. E.g.: * A less dumber implementation that would stream the remote URL, check the header as we go, and disconnect if we've got that content locally. * Ditto, but using an ETag or other strategy. * N boxes could share a cache an NFS with a shared mount, or N disconnected git processes could use a common cache without the need for a front-line HTTP proxy server. * It would be trivial to extend this to guard against a "thundering herd" (e.g. concurrent CI) downloading the same bundle N times. As soon as we'd get the header we'd create a $cache_key.lock as we download the rest, and other concurrent clients spotting that would wait, then eventually cache "$cache_key". Still racy as N clients could download the header in parallel, but way less so (the header will be a tiny part of the payload). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> --- Documentation/config/transfer.txt | 7 +++++++ fetch-pack.c | 6 ++++++ 2 files changed, 13 insertions(+) diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt index ae85ca5760b..5310cd96cb9 100644 --- a/Documentation/config/transfer.txt +++ b/Documentation/config/transfer.txt @@ -84,6 +84,13 @@ transfer.bundleURI:: using bundles to bootstap is possible. Defaults to `true`, i.e. bundle-uri is tried whenever a server offers it. +transfer.bundleURI.downloader:: + When set to `<program>` will be invoked when + `transfer.bundleURI` is in effect to download URIs containing + bundles. Expected to take one `URI` as an argument, and to + emit the bundle on STDOUT. Defaults to "curl --silent --output + - --". I.e. we'll invoke "curl --silent --output - -- <URI>". + transfer.injectBundleURI:: Allows for the injection of `bundle-uri` lines into the protocol v2 transport dialog (see `protocol.version` in diff --git a/fetch-pack.c b/fetch-pack.c index 316fb2fd65d..7e696142c4d 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1121,12 +1121,18 @@ static int get_bundle_uri(struct string_list_item *item, unsigned int nth, const char *uri = item->string; FILE *out; int out_fd; + const char *tmp; strvec_push(&cmd.args, "curl"); strvec_push(&cmd.args, "--silent"); strvec_push(&cmd.args, "--output"); strvec_push(&cmd.args, "-"); strvec_push(&cmd.args, "--"); + if (!git_config_get_string_tmp("transfer.bundleURI.downloader", &tmp)) { + strvec_clear(&cmd.args); + strvec_push(&cmd.args, tmp); + cmd.use_shell = 1; + } strvec_push(&cmd.args, item->string); cmd.git_cmd = 0; cmd.no_stdin = 1; -- 2.36.0.rc2.902.g60576bbc845