Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and

Linus Torvalds <torvalds@xxxxxxxx> · Sat, 29 Apr 2006 10:35:35 -0700 (PDT)

On Sat, 29 Apr 2006, linux@xxxxxxxxxxx wrote:
> 
> Well, the only reason that you need ANY commit in the repository is
> because it's part of history, and comparing it with other versions is
> meaningful.  So what trees, not already in the ancestry graph of a
> given commit, are useful to compare to?  In particular, useful for some
> automated process; manual comparisons can always be done manually.
> 
> Nothing's jumping out at me.  Any suggestions?

The only thing that I've ever wondered about is the "base commit of a 
merge".

Now, the thing is, we can always compute it. That's true _iff_ we've 
merged using the standard merge mechanism, but it wasn't always true 
historically (eg the original merges were computed with the original 
"git-merge-base" algorithm, which just picked the _first_ merge base it 
would find, while these days we use multiple ones for criss-cross merges).

So I would not totally object if a merge algorithm added a

	merge-base <sha1>

notation. But while it _could_ be just a "note merge-base <sha1>", it 
should _not_ be a "link <sha1> merge-base".

Let me explain why I think there are differences between those three 
options, and why I actually think that two of them are "valid" ideas, 
while the third one is not.

 - Case 1: the

	merge-base <sha1>

   is a "valid" idea (where there might of course be more than one <sha1>, 
   and possibly more than one "merge-base" line: you'd have to have some 
   rule for what happens for a recursive merge), although it has the 
   generally big down-side of being redundant information in all current 
   setups.

   It's redundant, but at the same time it's information that in _theory_ 
   might not be redundant, because I can see a situation where a merge was 
   forced by manually specifying a merge base (eg a special merge like the 
   original "gitk" merge, merging two initially unrelated projects 
   together).

   In theory. So it could be real information for a merge commit. And we'd 
   enforce some kind of real semantics for it - and it would have a really 
   solid technical meaning: assuming we define the multi-merge-base 
   semantics properly it would NEVER have any question about "what are 
   best practices?" or "what does this mean?".

   So this "case 1" actually has technical consequences, but you can, for 
   example, actually _check_ them. You can make fsck literally complain if 
   the merge base doesn't make sense. There's a clear "technical 
   violation", which might not be entirely trivial to figure out, but 
   thanks to it having a good meaning and a strict definition, it's 
   _there_.

Now, in all honesty, I don't think "case 1" is a _good_ thing to do. I'm 
just saying that I wouldn't be as upset about it as I've been over this 
"link" discussion. The reason I think "case 1" sucks is simply that I 
think you can in _practice_ get all the benefits much better with "case 
2", even if that one doesn't imply any actual git semantics:

 - Case 2: the

	note merge-base <sha1>

   thing is _also_ a perfectly valid idea, because now it's also very 
   well-defined: the "note" part tells you that git doesn't actually 
   impose any semantics what-so-ever on it, so it's really just a comment, 
   and as in case 1 above, once you see it as a comment, the _meaning_ of 
   it is immediately clear. It's literally just a note from the merge 
   algorithm saying "I used this as a merge base".

   The "note" syntax actually has a huge advantage. When you see it as a 
   comment from the merge algorithm, you immediately think it might also 
   be a good idea to add a few other notes. So a merge commit might 
   actually have

	note merge-algorithm recursive
	note merge-conflicts none
	note merge-base <sha1>

   all make total sense. It's telling you what the algorithm used was, and 
   that it didn't neen any manual fixups. It's also telling you that none 
   of this has _any_ impact what-so-ever from a "git semantics" angle, and 
   that this is nothing but a note for anybody who starts digging into it.

So now I've shown _two_ examples of some kind of header that I think 
actually makes sense, and that I would not argue against on those grounds. 
Especially the "note" thing I think is fine. So why, oh why, do I hate the 
"link" thing so much?

 - Case 3: the

	link <sha1> merge-base

   thing is a horrible and nasty thing that we should never ever support.

   Why? Because it's literally designed to both have some semantic meaning 
   ("git will fetch the <sha1> and use it for connectivity analysis") 
   _and_ at the same time the whole syntax it's designed to _not_ have any 
   real meaning ("you can have any kind of link, and I don't know what 
   it actually means from a conceptual standpoint").

   So it has a meaning from an _implementation_ angle, but at the same 
   time it does not have a "higher cause". That is EVIL. When they say 
   "The road to hell is paved with good intentions", the implication there 
   is not that good intentions is bad per se, but that you should 
   understand that there are "Unintended Consequences".

   And if you cannot limit the thing to a very _specific_ higher-level 
   meaning, you by definition will have those "unintended consequences".

In short, the difference between three headers that on the face of it say 
exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and 
"link merge-base <sha1>" is not that they have different syntax (hey, even 
the syntax itself is almost identical), but exactly the fact that they 
have different implications and _meaning_.

Two of the three have no unintended consequences. One ("note") has no 
technical "consequences" at _all_, by definition. The other "merge-base" 
has no technical "unintended" at all, because it's throught through, and 
has been fully defined.

The third? "unintended consequences". It doesn't have a clear definition 
("It's cool. You can use it for any link you want"). So pretty much BY 
DESIGN, it's set up so that you don't know what the consequences of it 
will be for a project.

And that's why "case 3" it's bad. Even though it looks very much like the 
two other ones.

			Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html