Re: Merge seems to get confused by (reverted) cherry-picks

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 3 Sep 2008 08:26:10 -0700 (PDT)

On Wed, 3 Sep 2008, Björn Steinbrink wrote:
> 
> OK, so that basically means "if you cherry-pick, you better make sure
> that you don't have to revert or just get your fine-toothed comb ready
> when you merge later", right? Any advice on how to deal with such a
> situation?

Well, it's not actually even really related to cherry-picks in particular, 
although yes, cherry-picks and reverts are perhaps the simplest case to 
explain.

At a more fundamental level, it's about git _not_ caring about any 
individual commit in the history, and only caring about the "big picture". 

In this, git is fairly consistent - the same way that git never cares 
about any individual _file_, but always merges the whole tree, it also 
never really cares about any individual _commit_, but always the whole 
history.

So it really doesn't matter if one commit undid another - the only thing 
that matters as far as git merging is concerned is what the final set of 
changes were from the common base point.

Now, there's nothing to say that git _couldn't_ try to look at individual 
commits when deciding how to merge, but I actually think it's 
fundamentally wrong to do so, for pretty much the same reason I think it's 
fundamentally wrong to try to encode rename information.

The fact is, should it really matter whether something was "reverted", or 
whether multiple gradual changes made ot go back to what it used to be? 
Git says no, an considers the two to be totally equivalent in the end. 
exactly the same way that git doesn't matter whether you first created a 
new file and later deleted a similar old file, or whether you renamed it. 

The only thing that matters is the end result.

So if you have one branch that first does "A", and then does "revert A", 
as far as git is concerned, that branch didn't do anything at all when it 
comes to data.

So when you merge it with another branch that does "A" too, the end result 
is that the merged contents will have A. That's fairly easy to understand 
if you just think of git as caring about the whole end result and just 
doing a three-way merge at the end points (which is what it does), but I 
would also like to explain why I think it's fundamentally the _right_ 
thing to do, not just an "implementation detail because it's simpler".

When one branch does the "revert A" does that really mean that A was bad? 
No. It could mean that A was "unnecessary" or "not quite ready". The 
revert, after all, was literally done in the context of _that_ branch, and 
the reasons may well have been totally private to that branch. Git doesn't 
know, and git shouldn't care - the only thing that should matter is the 
end result.

To really hammer this point in, let's say that "A" was really doing two 
different things - A1 and A2 - to two different files. And let's further 
say that it had been cherry-picked because the one branch needed just a 
part of it - A1. And then later, in a fit of cleanup madness, that branch 
undid A2, because it really didn't need it.

When you merge the two branches, what do you expect? You could argue that 
you would expect A2 to be undone in the original branch too. It was, after 
all, a partial revert. But I think you'll agree that it's not at all 
"obvious" any more.

In fact, I will argue that it would be horribly _wrong_, because the 
branch that undid part of A could have done it two different ways:

 - it could do the "cherry-pick A" as one commit and the "undo A2" as 
   another (as I implied above)

 - but it could equally well have done a "cherry-pick just part of A", and 
   done it as just one commit (perhaps because it noticed that A2 didn't 
   even _compile_ within the context of that branch, and did an '--amend' 
   to fix it up rather than create a new commit.

See? Shouldn't both really act the same? Should it really make a 
difference to what git does if there was a cherry-pick and a partial 
revert, or a partial cherry-pick? Should _how_ you do something matter 
more than the end result? HELL NO!

And is the "partial revert" really any different from the "total revert"?

Now, if you're a math person, think of the "limit behavior" as A2 
approaches all of A. The final end result is that you were to revert _all_ 
of A. Should that limit case be fundamentally different from the case of 
A2 being just a _part_ of A? What should the logic be? What if all of A 
was reverted except for the whitespace cleanups that it did (almost by 
mistake?)

So in the end, the answer is that git doesn't care about individual 
commits, because caring about individual commits is totally crazy. Git 
_remembers_ them, of course, and it can _show_ you them when you merge, 
but the actual end result depends purely on the *state* of the merge (and 
the "big picture" of the history, ie where the branches join etc), not of 
the small details of how you got to that state.

And that is fundamentally the only sane thing to do.

Here's another final thought to leave you with:

 - what if the other branch had decided that instead of reverting it, it 
   could just do a "git rebase -i" _without_ it, because that other branch 
   had never been exposed to anybody else?

See? The "how you got to some state" really must be immaterial in a stable 
merge strategy. I realize that I'm at odds with some SCM people on this, 
but I'm ok with that, because I also realize that all those other SCM 
people are just _stupid_.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html