Re: [PATCH] xdiff: add xdl_merge() (was: (unknown))

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 22 Nov 2006 10:29:55 +0100 (CET)

Hi,

On Wed, 22 Nov 2006, Jakub Narebski wrote:

> Johannes Schindelin wrote:
> 
> > [PATCH] xdiff: add xdl_merge()
> 
> Shouldn't this be in the subject of message?

Did I mention that I was tired and hungry?

> > This new function implements the functionality of RCS merge, but 
> > in-memory. It returns < 0 on error, otherwise the number of conflicts.
> 
> Only RCS merge, or can you implement whole diff3 (from GNU diffutils) 
> functionality with that?

As I am interested only in the in-memory merge, only RCS merge. Which 
feature would you be interested in? An ed script? :-)

> > Finding the conflicting lines can be a very expensive task. You can
> > control the eagerness of this algorithm:
> > 
> > - a level value of 0 means that all overlapping changes are treated
> >   as conflicts,
> > - a value of 1 means that if the overlapping changes are identical,
> >   it is not treated as a conflict.
> > - If you set level to 2, overlapping changes will be analyzed, so that
> >   almost identical changes will not result in huge conflicts. Rather,
> >   only the conflicting lines will be shown inside conflict markers.
> > 
> > With each increasing level, the algorithm gets slower, but more accurate.
> > Note that the code for level 2 depends on the simple definition of
> > mmfile_t specific to git, and therefore it will be harder to port that
> > to LibXDiff.
> 
> How it compares performance with RCS merge/GNU diff3?

Speedwise, I have no clue. It was enough work for a day.

Accuracywise: often I sent a patch (series) which was in my current git 
tree (no topic branch), and Junio did some minor adjustments. I _hated_ 
the fact that RCS merge marked _all_ overlapping changes as conflicts, 
even when there was just a minor correction here and there. And "git diff 
--ours" does not help at all.

Here is where my implementation should help. With level 2, it will look 
again at these conflicting regions, and only output the actual differences 
as conflicts.

> It is really nice to have that. Bram Cohen (of Codeville, SCM built around
> sophisticated merge algorithm) wrote about recursive three-way merge in
> http://revctrl.org/CrissCrossMerge
> 
>    Recursive three-way merge _usually_ provides the right answer, however
>    there are some edge cases. For example, conflict markers can be matched
>    incorrectly, because they aren't given any special semantic meaning for
>    the merge algorithm, and are simply treated as lines. In particular,
>    there are (somewhat complicated) cases where the conflict markers of two
>    unrelated conflicts get matched against each other, even though the
>    content sections of them are totally unrelated.
> 
> I'm not sure if he has specific examples, or is it just theoretical talk,
> but having built-in merge would certainly help revursive merge strategy
> (and perhaps also git-rerere).

It should be easy to construct such an example. However, the relevance in 
practice is about zero.

Git was built from the beginning to aim to do a merge as good as possible, 
but not perfect. There is no such thing as a perfect merge algorithm. You 
will always be able to construct cases which are mismerged.

Thus, git takes the pragmatic approach and stops "early": merges work in 
99% of the time, and in 99% of the remaining 1% the merge will fail so 
that you know you have to fix it manually. (Take these numbers with a 
grain of salt, please.) The advantage of stopping there is that we can 
make it really fast.

You could probably raise the 99% to 99.5%, by implementing a "rebasing 
merge", i.e. cherry-picking the branch-to-be-merged committing only in the 
end (if there has not been any conflict). Obviously, this is slow as 
Parnell's pitch[1].

As for git-rerere: I could not use it everywhere, because of some Perl 
dependencies which I could not compile on some platforms. However, IMHO 
git-rerere does not necessarily depend on merge being available in libgit.

Ciao,
Dscho

Footnote 1: http://www.physics.uq.edu.au/pitchdrop/pitchdrop.shtml