[PATCH] bundle-uri: copy all bundle references ino the refs/bundle space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Scott Chacon <schacon@xxxxxxxxx>

When downloading bundles via the bundle-uri functionality, we only copy the
references from refs/heads into the refs/bundle space. I'm not sure why this
refspec is hardcoded to be so limited, but it makes the ref negotiation on
the subsequent fetch suboptimal, since it won't use objects that are
referenced outside of the current heads of the bundled repository.

This change to copy everything in refs/ in the bundle to refs/bundles/
significantly helps the subsequent fetch, since nearly all the references
are now included in the negotiation.

Signed-off-by: Scott Chacon <schacon@xxxxxxxxx>
---
    bundle-uri: copy all bundle references ino the refs/bundle space
    
    This patch probably isn't meant for inclusion, but I wanted to see if
    I'm crazy here or missing something.
    
    It appears that the bundle-uri functionality has an issue with ref
    negotiation. I hit this because I assumed all the objects I bundled
    would be seen in the negotiation, but since only references under
    refs/heads are copied to refs/bundles, they are the only ones that are
    seen for negotiation, so it's quite inefficient.
    
    I did several experiments trying to create a bundle where the subsequent
    fetch was almost a no-op and it was frustratingly impossible and it took
    me a while to figure out why it kept trying to get tons of other
    objects.
    
    Furthermore, when I bundled just a tag (thinking it would have most
    reachable objects) it completely failed to work because there were no
    refs/heads/ available for negotiation - so it downloaded a huge file and
    then still started from scratch on the fetch.
    
    However, if I copy all the refs in the bundle, it makes a big
    difference.
    
    Here are some benchmarks from the gitlab oss repo.
    
    A normal clone pulls down 3,005,985 objects:
    
    ❯  time git clone https://gitlab.com/gitlab-org/gitlab-foss.git gl5
    Cloning into 'gl5'...
    remote: Enumerating objects: 3005985, done.
    remote: Counting objects: 100% (314617/314617), done.
    remote: Compressing objects: 100% (64278/64278), done.
    remote: Total 3005985 (delta 244429), reused 311002 (delta 241404), pack-reused 2691368 (from 1)
    Receiving objects: 100% (3005985/3005985), 1.35 GiB | 23.91 MiB/s, done.
    Resolving deltas: 100% (2361484/2361484), done.
    Updating files: 100% (59972/59972), done.
    (*) 162.93s user 37.94s system 128% cpu 2:36.49 total
    
    
    Then, I tried to bundle everything from a fresh clone, including all the
    refs.
    
     ❯  git bundle create gitlab-base.bundle --all
    
    
    This creates a 1.4G bundle, which I uploaded to a CDN and cloned again
    with the bundle-uri:
    
    ❯  time git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl4
    Cloning into 'gl4'...
    remote: Enumerating objects: 1092703, done.
    remote: Counting objects: 100% (973405/973405), done.
    remote: Compressing objects: 100% (385827/385827), done.
    remote: Total 959773 (delta 710976), reused 766809 (delta 554276), pack-reused 0 (from 0)
    Receiving objects: 100% (959773/959773), 366.94 MiB | 20.87 MiB/s, done.
    Resolving deltas: 100% (710976/710976), completed with 9081 local objects.
    Checking objects: 100% (4194304/4194304), done.
    Checking connectivity: 959668, done.
    Updating files: 100% (59972/59972), done.
    (*) 181.98s user 40.23s system 110% cpu 3:20.89 total
    
    
    Which is better from an "objects from the server" perspective, but still
    has to download 959,773 objects, so 32% of the total. But it also takes
    quite a lot longer, because it's redownloading most of those objects for
    a second time.
    
    If I apply this patch where I change the refspec for the bundle ref copy
    from refs/heads/ to just refs/ and clone with this patched version, it's
    much better:
    
    ❯  time ./git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl3
    Cloning into 'gl3'...
    remote: Enumerating objects: 65538, done.
    remote: Counting objects: 100% (56054/56054), done.
    remote: Compressing objects: 100% (28950/28950), done.
    remote: Total 43877 (delta 27401), reused 25170 (delta 13546), pack-reused 0 (from 0)
    Receiving objects: 100% (43877/43877), 40.42 MiB | 22.27 MiB/s, done.
    Resolving deltas: 100% (27401/27401), completed with 8564 local objects.
    Updating files: 100% (59972/59972), done.
    (*) 143.45s user 29.33s system 124% cpu 2:19.27 total
    
    
    Now I'm only getting an extra 43k objects, so 1% of the original total,
    and the entire operation is a bit faster as well.
    
    I'm not sure if there is a downside here, it seems clearly how you would
    want the negotiation to go. It ends up with way more refs under
    refs/bundle (now there is refs/bundle/origin/master, etc) but that's
    being polluted by the head refs anyhow, right?
    
    Is this a reasonable change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1897%2Fschacon%2Fsc-more-bundle-refs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1897/schacon/sc-more-bundle-refs-v1
Pull-Request: https://github.com/git/git/pull/1897

 bundle-uri.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 744257c49c1..3371d56f4ce 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -403,7 +403,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 		const char *branch_name;
 		int has_old;
 
-		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+		if (!skip_prefix(refname->string, "refs/", &branch_name))
 			continue;
 
 		strbuf_setlen(&bundle_ref, bundle_prefix_len);

base-commit: 2d2a71ce85026edcc40f469678a1035df0dfcf57
-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux