From: Scott Chacon <schacon@xxxxxxxxx> When downloading bundles via the bundle-uri functionality, we only copy the references from refs/heads into the refs/bundle space. I'm not sure why this refspec is hardcoded to be so limited, but it makes the ref negotiation on the subsequent fetch suboptimal, since it won't use objects that are referenced outside of the current heads of the bundled repository. This change to copy everything in refs/ in the bundle to refs/bundles/ significantly helps the subsequent fetch, since nearly all the references are now included in the negotiation. Signed-off-by: Scott Chacon <schacon@xxxxxxxxx> --- bundle-uri: copy all bundle references ino the refs/bundle space This patch probably isn't meant for inclusion, but I wanted to see if I'm crazy here or missing something. It appears that the bundle-uri functionality has an issue with ref negotiation. I hit this because I assumed all the objects I bundled would be seen in the negotiation, but since only references under refs/heads are copied to refs/bundles, they are the only ones that are seen for negotiation, so it's quite inefficient. I did several experiments trying to create a bundle where the subsequent fetch was almost a no-op and it was frustratingly impossible and it took me a while to figure out why it kept trying to get tons of other objects. Furthermore, when I bundled just a tag (thinking it would have most reachable objects) it completely failed to work because there were no refs/heads/ available for negotiation - so it downloaded a huge file and then still started from scratch on the fetch. However, if I copy all the refs in the bundle, it makes a big difference. Here are some benchmarks from the gitlab oss repo. A normal clone pulls down 3,005,985 objects: ❯ time git clone https://gitlab.com/gitlab-org/gitlab-foss.git gl5 Cloning into 'gl5'... remote: Enumerating objects: 3005985, done. remote: Counting objects: 100% (314617/314617), done. remote: Compressing objects: 100% (64278/64278), done. remote: Total 3005985 (delta 244429), reused 311002 (delta 241404), pack-reused 2691368 (from 1) Receiving objects: 100% (3005985/3005985), 1.35 GiB | 23.91 MiB/s, done. Resolving deltas: 100% (2361484/2361484), done. Updating files: 100% (59972/59972), done. (*) 162.93s user 37.94s system 128% cpu 2:36.49 total Then, I tried to bundle everything from a fresh clone, including all the refs. ❯ git bundle create gitlab-base.bundle --all This creates a 1.4G bundle, which I uploaded to a CDN and cloned again with the bundle-uri: ❯ time git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl4 Cloning into 'gl4'... remote: Enumerating objects: 1092703, done. remote: Counting objects: 100% (973405/973405), done. remote: Compressing objects: 100% (385827/385827), done. remote: Total 959773 (delta 710976), reused 766809 (delta 554276), pack-reused 0 (from 0) Receiving objects: 100% (959773/959773), 366.94 MiB | 20.87 MiB/s, done. Resolving deltas: 100% (710976/710976), completed with 9081 local objects. Checking objects: 100% (4194304/4194304), done. Checking connectivity: 959668, done. Updating files: 100% (59972/59972), done. (*) 181.98s user 40.23s system 110% cpu 3:20.89 total Which is better from an "objects from the server" perspective, but still has to download 959,773 objects, so 32% of the total. But it also takes quite a lot longer, because it's redownloading most of those objects for a second time. If I apply this patch where I change the refspec for the bundle ref copy from refs/heads/ to just refs/ and clone with this patched version, it's much better: ❯ time ./git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl3 Cloning into 'gl3'... remote: Enumerating objects: 65538, done. remote: Counting objects: 100% (56054/56054), done. remote: Compressing objects: 100% (28950/28950), done. remote: Total 43877 (delta 27401), reused 25170 (delta 13546), pack-reused 0 (from 0) Receiving objects: 100% (43877/43877), 40.42 MiB | 22.27 MiB/s, done. Resolving deltas: 100% (27401/27401), completed with 8564 local objects. Updating files: 100% (59972/59972), done. (*) 143.45s user 29.33s system 124% cpu 2:19.27 total Now I'm only getting an extra 43k objects, so 1% of the original total, and the entire operation is a bit faster as well. I'm not sure if there is a downside here, it seems clearly how you would want the negotiation to go. It ends up with way more refs under refs/bundle (now there is refs/bundle/origin/master, etc) but that's being polluted by the head refs anyhow, right? Is this a reasonable change? Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1897%2Fschacon%2Fsc-more-bundle-refs-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1897/schacon/sc-more-bundle-refs-v1 Pull-Request: https://github.com/git/git/pull/1897 bundle-uri.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 744257c49c1..3371d56f4ce 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -403,7 +403,7 @@ static int unbundle_from_file(struct repository *r, const char *file) const char *branch_name; int has_old; - if (!skip_prefix(refname->string, "refs/heads/", &branch_name)) + if (!skip_prefix(refname->string, "refs/", &branch_name)) continue; strbuf_setlen(&bundle_ref, bundle_prefix_len); base-commit: 2d2a71ce85026edcc40f469678a1035df0dfcf57 -- gitgitgadget