Re: git-fetch fetches blobs that are already in the local repository if no history is shared?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 24, 2012 at 7:53 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Tue, Apr 24, 2012 at 07:19, Adam Roben <adam@xxxxxxxxx> wrote:
>> There are two main git mirrors of the WebKit Subversion repository: <git://git.webkit.org/WebKit.git> and <https://github.com/WebKit/webkit>. These repositories have the exact same trees/blobs, but have entirely different commits due to the GitHub mirror using a custom --authors-prog with git-svn.
>>
>> Tor Arne (CCed) noticed something interesting today:
>>
>> If you clone one of these repositories, then add the other as a remote and fetch it, all the trees/blobs seem to get pulled down again, even though they're already in the local repository. It seems like only the commit objects should be fetched, since they're the only difference between the two remotes.
>>
>> Is this a bug in git?
>
> No. Its the way the Git protocol was designed to function. Git only
> negotiates over the commit history, as trying to include the blob and
> tree information into the negotiation protocol would make the payloads
> unreasonable in size. Granted in this case sending the 100M or
> whatever it takes to enumerate all SHA-1s is smaller than the 4G or
> whatever that WebKit actually is, but the protocol assumes nobody
> would be this crazy to establish a huge project with two different
> competing commit histories and then think they could fetch them
> together into one repository with a small network delta.
>
> Basically... Don't do this, and don't expect Git to save you.

That said, if you have shell access to the remote server you *can* do
this.  We needed to do something like this as a one-time thing once,
and I think I just made a note of the SHA for the new branch, then ran
'git rev-list new-branch | git pack-objects pack'.  Copy the two files
created to the other machine's ".git/objects/pack" directory, then
give the SHA a name.

It's a hack but if you need it, you need it... :)

> There should be only one version of the WebKit history imported into
> Git that everyone agrees on as being the canonical version of that
> import. And everyone else who mirrors or works with WebKit in Git
> should base off that version.
>
> WebKit is a big enough project with enough users that you would think
> you could trust the git.webkit.org conversion. Which suggests the
> github.com one should be done over.


-- 
Sitaram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]