Carl Worth <cworth@xxxxxxxxxx> writes: > On Fri, 14 Apr 2006 02:31:36 -0700, Junio C Hamano wrote: >> I am beginning to think using "graft" to cauterize history >> for this, while it technically would work, would not be so >> helpful to users, so the design needs to be worked out again. > > As context, here is some of what you mentioned in IRC: > >>> Suppose you have this: >>> >>> A---B---C >>> \ \ >>> D---E---F---G >>> >>> and you made a shallow clone of C (because that is where the >>> upstream master was when you made that clone). Then the >>> upstream updated the master branch tip to G. >>> >>> The next update from upstream to your shallow clone would break. >>> The upstream says: I have G at master. >>> You say: I want G then. By the way, I have C. >>> >>> What it means to tell the other end "I have X" is to promise >>> that you have X and _everything_ behind it. So the upstream >>> would send objects necessary to complete D, E, F and G for >>> "somebody who already have A and B". As a consequence, you >>> would not see A nor B. >>> >>> Even if the only thing you are interested in is to be in sync >>> with the tip of the upstream, you can end up with an >>> incomplete tree for G, if some of the blobs or trees contained >>> in G already exist in A or B. They are not sent -- because >>> you told the upstream that you have everything necessary to >>> get to C. > > So that's an argument against using a cauterizing graft for the > shallow clone of C. It definitely confuses the existing protocol to > say "I have C" if I have only a cauterized C, (its tree only, but none > of the commits that should be backing C). That's what I meant by "graft technically works but is inconvenient". Maybe after the update to G happens (which means you now have C, F, G but not A B D E commits), the client side could enumerate commits on "rev-list ^C G" and cauterize the ones with missing parents (in this case, F does not have one of its parents). While doing this would help keeping the resulting commit ancestry sane, it does not solve the problem of missing blobs and trees. See below. > So, in the scenario above, the original shallow clone of C would be: > > Want C->tree, have nothing. > > and the later shallow update to G would be: > > Want G->tree, have C->tree When you ask for G, you do not know what G^{tree} is, so that is fantasy without a protocol extention. To solve the missing blobs/trees problem we would probably need a protocol extention that says it wants to receive enough data to complete trees and blobs associated with the commits being sent _without_ assuming the recipient has any trees or blobs other than what are contained in "have" commits. Then after such a successful transfer, missing parents of commits listed in "rev-list ^C G" are the ones from the side branch, so the client can cauterize them (F in the above example) appropriately without bothering the server. However, I think this "do not assume I have any trees behind the commits I explicitly say I have" must be an option, because it makes the resulting transfer unnecessarily more expensive for normal uses. A fetch of the Linux kernel once a day would update about a couple of hundered commits, each of which touches only 3 paths on average (so that would be 600 files out of 18,000 file tree. When side-branch merges are involved, usually many things in G (and F) are unchanged since either A or C, but the extention we are discussing forbids reusing what are found in A (it still allows reusing what are found in C). > A final step of a shallow clone would then require creating a new > parent-less commit object so that there's something to point refs/head > at, (or maybe rather than being parentless, they could be chained > together with each update?). Rewriting commit objects transferred to the cloner is something you would _not_ want to do (e.g. rewriting F commits to say it has only one parent C). The history based on that would diverge from parents and would become unmergeable. It is cleaner to just make a new graft entry to say "As far as this repository is concerned, F has one parent C". Shallowness of the repository and its slightly different view of history is a local matter. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html