Re: inexplicable failure to merge recursively across cherry-picks

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 11 Oct 2007 15:33:04 -0700 (PDT)

On Fri, 12 Oct 2007, Sam Vilain wrote:
> 
> 1. do a --cherry-pick rev-list on just the file being merged and see if
> all the changes on one side disappear, in which case just take the result.
> 
> 2. see if the files were identical at some point, in which case use a
> new merge base for that file based on the changes since that revision.
> 
> I actually thought #2 was already the way recursive worked!

Actually, I think both of these are fundamentally wrong.

The reason is that you talk about "the file".

Anything that is based on per-file heuristics is going to mean that you 
use a history that is not necessarily compatible with the _other_ files in 
the project.

I agree 100% that per-file history information is going to find more 
things to merge automatically. But the point I was trying to make was that 
"automatic merges" aren't always *good*. 

I realize that pretty much every single theoretical merge algorithm out 
there tries to make merges happen automatically as much as possible, but 
they all en dup having strange issues.

For example, take a patch that cherry-picked into mainline from a 
development branch, but that partly depended on some support-feature that 
wasn't in mainline yet. So then there is another patch that removes part 
of that patch from mainline. So mainline is fine.

Now, three months later, the development branch is stable, and is fully 
merged. What happens?

Git will largely get this right. Git will look at the last *global* common 
base, and will just look at the contents, and do a reasonable job. Yes, 
there will probably be conflicts (because both the development branch and 
the mainline ended up touching the same parts of the files thanks to the 
cherry-pick, and yet mainline has some added hacks on top to disable it), 
but on the whole that's exactly what you want!

(Alternatively, maybe the "remove the part that wasn't supported yet" 
ended up meaning that that particular part of the patch was excised 
entirely from mainline, and there was no conflict at all, and git just 
merged the new stuff from the development branch cleanly! So I'm not 
saying that it *has* to conflict, I'm just saying that it might have).

In other words: git always "does the right thing". Assuming both branches 
are stable and working, git does a very reasonable thing. It's obviously 
not always the thing people may *want*, but it's guaranteed to be a 
reasonable and simple guess, and there's no way it's "too clever for its 
own good, and just screwed the pooch entirely".

In contrast, your suggested merge strategy would be HORRIBLY BROKEN!

Why? Because it doesn't look at the *common* history to the project, it 
looks at some per-file state that is totally bogus and has no relevance. 
Think it through: what happens if there were files with the same content 
(because of the cherry-pick), and then the file history for one of the 
branches was later changed to disable something because the support for it 
wasn't in the "whole history"?

Right: the final merge will contain that change! Because there as a time 
where the file was identical (the cherry-pick), so you're taking all the 
later changes to that file (the undo)!

Notice? Totally the wrong thing to do!

So this is a classic case of trying to make "easier" merges, but where the 
whole approach is totally broken! You simply MUST NOT add logic like that. 
It's a lot better to give a conflict, than to try to be "clever", and 
silently do the wrong thing.

Yes, you can be really stupid, and silently do the wrong thing too, but if 
you're stupid, at least the "silent wrong thing" is never really subtle, 
it's pretty much guaranteed to easily explainable. And the good news is 
that you didn't have a complicated and fragile algorithm just to get the 
wrong answer.

(Put another way: if you are always going to have situations where you get 
the wrong answer, you'd better take the simple and stupid algorithm, 
because people are more likely to then be able to _predict_ that wrong 
answer and are thus more ready to handle it!)

So being clever really is the wrong thing to do. And using history that 
isn't global and true history (ie just looking at one file, and deciding 
that matching that one file "means" something) is fundamnetally broken.

In fact, in general, individual pieces of history are totally worthless. 
The fact that some individual change was done in one branch doesn't really 
tell you *anything*. The reason that change was done may be implied by all 
the previous changes, or conversely, later changes may have undone the 
change, so any merge algorithm that starts to look at individual commits 
is likely to be pure and utter crap - exactly because it's starting to 
make decisions based on local information that may not be valid in the big 
picture.

(Where the "big picture" may be either about "space" - other files - or 
about "time" - other commits, that simply mean that the individual changes 
of one commit are meaningless on their own).

Btw, one thing to note is how well the simple and stupid git merge 
strategy works. It turns out that doing things with the "big picture" 
model actually does work really well. People think that they need 
"finegrained history" to make good merges, but I think most people who 
have actually done a fair number of merges with git have noticed that it's 
actually pretty dang painless.

But to be honest, there are cases where git isn't being very helpful. In 
particular, I think there *are* things that git could probably be more 
helpful with, but looking at local history is not one of them, I think.

So here are some suggestions on things I think we could improve on:

 - I think it would be wonderful to have a helper tool for handling 
   conflicts. 

   In particular, while I don't think per-file history is good for 
   resolving conflicts *automatically*, I actually do think that per-file 
   history can be a good way to *manually* resolve conflicts.

   In other words, it you have a conflict, I think it would be wonderful 
   to have some git-gui-like thing that can show the history (with 
   patches) for that file, and basically combine a three-way graphical 
   merge *with* some per-commit information where you can say "choose the 
   thing that that commit did for this conflict".

   So I think git already has tools to help resolve conflicts, and I 
   personally love doing "gitk --merge" or "git log -p --merge" when a 
   conflict does happen, but I think some smart GUI person could do 
   something even much better!

   And notice how I think that it's *really* wrong to use per-file history 
   automatically, but that I think it's not wrong at all to use it when 
   there's a human that says "ok, obviously pick that case". Things that 
   are horrible when they cause subtle and automatic resolves can be very 
   good when they cause subtle resolves that a human looked at!

 - I suspect we have issues with common whitespace changes, where we again 
   could probably help people resolve whitespace changes etc better. 

   Again, I don't think those are necessarily things you want to do 
   automatically, but I know from personal experience that handling things 
   like one side having done a re-indent can be *really* annoying, just 
   because you end up doing tons of mindless stuff when you fix up all the 
   totally idiotic and usually trivial conflicts.

.. and I'm sure there are other things we could do better too, but the 
above two are things that while they haven't happened for me for the 
kernel (probably because we have learnt how to not cause them over the 
years), I've seen them in other places.

And yes, the above two suggestions fall solidly in the "conflicts aren't 
bad per se, but you want to make the tool really help you resolve them!" 
camp. 

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html