Re: feature suggestion: optimize common parts for checkout --conflict=diff3

Jeff King <peff@xxxxxxxx> · Thu, 7 Mar 2013 13:01:57 -0500

On Thu, Mar 07, 2013 at 09:26:05AM -0800, Junio C Hamano wrote:

> Without thinking about it too deeply,...
> 
> I think the "RCS merge" _could_ show it as "1234A<B=X>C<D=Y>E789"
> without losing any information (as it is already discarding what was
> in the original in the part that is affected by the conflict,
> i.e. "56 was there").

Right, I think that is sane, though we do not do that at this point.

> Let's think aloud how "diff3 -m" _should_ split this. The most
> straight-forward representation would be "1234<ABCDE|56=AXCYE>789",
> that is, where "56" was originally there, one side made it to
> "ABCDE" and the other "AXCYE".

Yes, that is what diff3 would do now (because it does not do any hunk
refinement at all), and should continue doing.

> You could make it "1234<AB|5=AX><C|=C><DE|6=YE>789", and that is
> technically correct (what there were in the shared original for the
> conflicted part is 5 and then 6), but the representation pretends
> that it knows more than there actually is information, which may be
> somewhat misleading.  All these three are equally plausible split of
> the original "56":
> 
> 	1234<AB|=AX><C|=C><DE|56=YE>789
> 	1234<AB|5=AX><C|=C><DE|6=YE>789
> 	1234<AB|56=AX><C|=C><DE|=YE>789
> 
> and picking one over others would be a mere heuristic.  All three
> are technically correct representations and it is just the matter of
> which one is the easiest to understand.  So, this is the kind of
> "misleading but not incorrect".

Yes, I agree it is a heuristic about which part of a split hunk to place
deleted preimage lines in. Conceptually, I'm OK with that; the point of
zdiff3 is to try to make the conflict easier to read by eliminating
possibly uninteresting parts. It doesn't have to be right all the time;
it just has to be useful most of the time. But it's not clear how true
that would be in real life.

I think this is somewhat a moot point, though. We do not do this
splitting now. If we later learn to do it, there is nothing to say that
zdiff3 would have to adopt it also; it could stop at a lower
zealous-level than the regular merge markers. I think I'd want to
experiment with it and see some real-world examples before making a
decision on that.

> In all these cases, the middle part would look like this:
> 
> 	<<<<<<< ours
>         C
>         ||||||| base
>         =======
> 	C
>         >>>>>>> theirs
> 
> in order to honor the explicit "I want to view all three versions to
> examine the situation" aka "--conflict=diff3" option.  We cannot
> reduce it to just "C".  That will make it "not just misleading but
> is actively wrong".

I'm not sure I agree. In this output (which does the zealous
simplification, the splitting, and arbitrarily assigns deleted preimage
to the first of the split hunks):

  1234A<B|56=X>C<D|Y>E789

I do not see the promotion of C to "already resolved, you cannot tell if
it was really in the preimage or not" as any more or less misleading or
wrong than that of A or E.  It is no more misleading than what the
merge-marker case would do, which would be:

  1234A<B=X>C<D=Y>E789

The wrong thing to me is the arbitrary choice about how to distribute
the preimage lines. In this example, it is not a big deal for the
heuristic to be wrong; you can see both of the hunks. But if C is long,
and you do not even see D=Y while resolving B=X, seeing the preimage
there may become nonsensical.

But again, we don't do this splitting now. So I don't think it's
something that should make or break a decision to have zdiff3. Without
the splitting, I can see it being quite useful. I'm going to carry the
patch in my tree for a while and try using it in practice for a while.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html