Re: git-fetch pulls already-pulled objects?

Matt Glazar <strager@xxxxxx> · Thu, 29 Oct 2015 19:52:27 +0000

> I forgot to mention the recent "pack bitmap" addition.  It makes the
> set of "can be cheaply proven to exist" a lot larger.

Cool! I tried this feature, and it worked! (At least, it worked for my
small test case.)

I ran on the server (after pushing the objects):

git config repack.writeBitmaps true
git repack -Ad

After this, the 'git fetch origin master2' was super quick.

Thanks for your help!

Aside: This test case is using (normal, C/sh) Git. My production
environment uses JGit on the server. I haven't tested this with JGit.

-----Original Message-----
From: Junio C Hamano <gitster@xxxxxxxxx>
Date: Thursday, October 29, 2015 at 11:42 AM
To: Matt Glazer <strager@xxxxxx>
Cc: "git@xxxxxxxxxxxxxxx" <git@xxxxxxxxxxxxxxx>
Subject: Re: git-fetch pulls already-pulled objects?

>Matt Glazar <strager@xxxxxx> writes:
>
>> Would negotiating the tree object hashes be possible on the client
>>without
>> server changes? Is the protocol that flexible?
>
>The protocol is strictly "find common ancestor in the commit
>history".  Everything else is done on the sender.
>
>>>The object transfer is done by first finding the common ancestor of
>>>histories of the sending and the receiving sides, which allows the
>>>sender to enumerate commits that the sender has but the receiver
>>>doesn't.  From there, all objects [*1*] that are referenced by these
>>>commits that need to be sent.
>
>>>[Footnote]
>>>
>>>*1* There is an optimization to exclude the trees and blobs that can
>>>be cheaply proven to exist on the receiving end.  If the receiving
>>>end has a commit that the sending end does *not* have, and that
>>>commit happens to record a tree the sending end needs to send,
>>>however, the sending end cannot prove that the tree does not have to
>>>be sent without first fetching that commit from the receiving end,
>>>which fails "can be cheaply proven to exist" test.
>
>I forgot to mention the recent "pack bitmap" addition.  It makes the
>set of "can be cheaply proven to exist" a lot larger.
>
>If for example the sender needs to send one commit C because it
>determined that the receiver has history up to commit C~1, without
>the bitmap, even when C^{tree} (i.e. the tree of C) is identical to
>C~2^{tree} (i.e. the tree of C~2), it would have sent that tree
>object because "proving that the receiver already has it" would
>require the sender to dig its history back, starting from C~1
>(i.e. the commit that is known to exist at the receiver), to
>enumerate the objects contained in the common part of the history,
>which fails the "can be cheaply proven to exist" test.
>
>The "pack bitmap" pre-computes what commits, trees and blobs should
>already exist in the repository given a commit for which bitmap
>exists.  Using the bitmap, from C~1 (i.e. the commit known to exist
>at the receiving end), it can be proven cheaply that C^{tree} that
>happens to be identical to C~2^{tree} already exists over there, and
>the sender can use this knowledge to reduce the transfer.
>
>The "pack bitmap" however does not change the fundamental structure.
>If your receiver has a commit that is not known to the sender, and
>that commit happens to record the same tree recorded in the commit
>that needs to be sent, there is no way for the sender to know that
>the receiver has it, exactly because the exchange between them is
>purely "find common ancestor in history".

��.n��������+%������w��{.n��������n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�