Re: Recent unresolved issues: shallow clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 14 Apr 2006 02:31:36 -0700, Junio C Hamano wrote:
>   Shallow clones (Carl Worth).
> 
>   The experiment last round did not work out very well, but as
>   existing repositories get bigger, and more projects being
>   migrated from foreign SCM systems, this would become a
>   must-have from would-be-nice-to-have.
> 
>   I am beginning to think using "graft" to cauterize history
>   for this, while it technically would work, would not be so
>   helpful to users, so the design needs to be worked out again.

As context, here is some of what you mentioned in IRC:

>>	Suppose you have this:
>>
>>	A---B---C
>>	 \       \ 
>>	  D---E---F---G
>>	 
>>	and you made a shallow clone of C (because that is where the
>>	upstream master was when you made that clone).  Then the
>>	upstream updated the master branch tip to G.
>>
>>	The next update from upstream to your shallow clone would break.
>>	The upstream says: I have G at master.
>>	You say: I want G then.  By the way, I have C.
>>
>>	What it means to tell the other end "I have X" is to promise
>>	that you have X and _everything_ behind it.  So the upstream
>>	would send objects necessary to complete D, E, F and G for
>>	"somebody who already have A and B".  As a consequence, you
>>	would not see A nor B.
>>
>>	Even if the only thing you are interested in is to be in sync
>>	with the tip of the upstream, you can end up with an
>>	incomplete tree for G, if some of the blobs or trees contained
>>	in G already exist in A or B.  They are not sent -- because
>>	you told the upstream that you have everything necessary to
>>	get to C.

So that's an argument against using a cauterizing graft for the
shallow clone of C. It definitely confuses the existing protocol to
say "I have C" if I have only a cauterized C, (its tree only, but none
of the commits that should be backing C).

I also read over some of your discussion of extending the protocol
with a new "shallow" extension.

I'm wondering if the shallow clone support couldn't be achieved
through a simpler tweak to the protocol semantics, (and no change to
protocol syntax), that would avoid the problem above. Specifically,
for shallow stuff, could we just do the same "want" and "have"
conversation with tree objects rather than commit objects?

So, in the scenario above, the original shallow clone of C would be:

	Want C->tree, have nothing.

and the later shallow update to G would be:

	Want G->tree, have C->tree

A final step of a shallow clone would then require creating a new
parent-less commit object so that there's something to point refs/head
at, (or maybe rather than being parentless, they could be chained
together with each update?).

I admit that this would result in a rather atypical kind of
repository, but it would contain plenty of valid trees and blobs, so
it should conceptually be fairly easy to promote such a thing to a
full repository.

But, even without any tool support for promotion, the ability to do
shallow clone and shallow updates would still provide a useful
capability [*].

-Carl

[*] For reference, what I'm looking for here is a way to justify
providing git support for jhbuild, which is a tool used by testers of
GNOME and other software to efficiently track the latest development
of an arbitrarily large number of packages. It's currently primarily a
CVS-based thing. Switching to git would be a huge win for the
incremental updates, but would currently cause quite a hit for the
first clone.

Attachment: pgpJu42HR9QPr.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]