Re: [PATCH 2/7] merge-ort: add ability to record conflict messages in a file

Elijah Newren <newren@xxxxxxxxx> · Tue, 28 Sep 2021 23:25:20 -0700

On Tue, Sep 28, 2021 at 3:29 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Tue, Aug 31, 2021 at 02:26:35AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > There are several considerations here:
> >   * We have to pick file(s) where we write these conflict messages too
> >   * We want to make it as clear as possible that it's not a real file
> >     but a set of messages about another file
> >   * We want conflict messages about a file to appear near the file in
> >     question in a diff, preferably immediately preceding the file in
> >     question
> >   * Related to the above, per-file conflict messages are preferred
> >     over lumping all conflict messages into one big file
> >
> > To achive the above:
> >   * We put the conflict messages for $filename in
> >       $filename[0:-1] + " " + $filename[-1] + ".conflict_msg"
> >     or, in words, we insert a space before the final character of
> >     the filename and then also add ".conflict_msg" at the end.
>
> It took me a minute to understand the space thing. I thought at first it
> was about avoiding conflicts with existing names (and while it might
> help in practice, it's not a guarantee). But I think it's about the
> "appear preceding the file" goal. The space sorts before any other
> printable character in the final position.

Yeah, it's all about the ordering.  I guess it helps slightly with
conflict avoidance, but I cannot rely on it; I have to check for
colliding files and potentially tweak the filename further.

> That's...simultaneously clever and gross. My biggest complaint is that
> the space looks like a bug in the output.

Junio had basically the same reaction[*].  :-)

[*] https://lore.kernel.org/git/xmqqk0k0qkmv.fsf@gitster.g/

> Using another character like "." might not be too bad, as it's also
> fairly early in the ascii table. But it's really this "do it before the
> last character" thing that is key to getting the ordering right.
>
> Just brainstorming some alternatives:
>
>  - we have diff.orderFile, etc. Could we stuff this data into a less
>    confusing name (even just "$filename.conflict_msg"), and then provide
>    a custom ordering to the diff code? I think it could be done by
>    generating a static ordering ahead of time, but it might even just be
>    possible to tell diffcore_order() to take the ".conflict_msg"
>    extension into account in its comparison function.

I can't just go on the ".conflict_msg" extension.  As you noted above,
this scheme is not sufficient for avoiding collisions.  So I need to
append extra "cruft" to the name in the case of collisions -- meaning
we can't special case on just that extension.

I also don't like how diff.orderFile provides a global ordering of the
files listed, rather than providing some scheme for relative
orderings.  That'd either force me to precompute the diff to determine
all the files that were different so I can list _all_ of them, or put
up with the fact that the files with non-content conflicts will be
listed very first in the output, even if their name is
'zee-last-file.c' -- surprising users at the output ordering.

This also means that if the user had their own ordering defined, then
I'm overriding it and messing up their ordering, which might be
problematic.

So, I'm not so sure about this solution; it feels like it introduces
bigger holes than the ugly space character it is fixing.

>  - there can be other non-diff data between the individual segments. For
>    example, "patch" will skip over non-diff lines. And certainly in Git
>    we have our own custom headers. I'm wondering if we could attach
>    these annotations to the diff-pair somehow, and then show something
>    like:
>
>      diff --git a/foo.c b/foo.c
>      index 1234abcd..5678cdef 100644
>      conflict modify/delete foo.c

A couple things here...

First, I'm not so sure I like the abbreviation here.  Just knowing
"modify/delete" might be enough in some cases, but I'd rather have the
full messages that would have been printed to the console, e.g.:

CONFLICT (modify/delete): foo.c deleted in HASH1 (SHORT
SUMMARY1) and modified in HASH2 (SHORT SUMMARY 2).  Version HASH2
(SHORT SUMMARY2) of  foo.c left in tree.

because I think the commit references are useful context.  That extra
context might be of little use for many modify/delete conflicts, but
is much more important for conflicts involving renames; e.g.
"rename/rename" is much less useful than being able to know the
original name of the file and with which parent commit each filename
is associated.  So, that raises the question: could we pack all that
information from the full conflict notice into these conflict
header(s)?  (And do we have to special case the code to print it all
on one line when doing the remerge-diff since the diff output needs
them to be one-line headers?)

Second, what about when there are multiple non-content conflict types
for a single file, e.g. rename/delete + rename/add + modify/delete +
mode conflict + unmergeable binary?  (Yes, I think it's possible for
one path to have all five of those: (1) source file deleted on one
side, renamed on the other, (2) rename target on one side matches new
file added on other side of history, (3) renamed file also had its
contents modified, thus modify vs. delete, (4) added file on other
side of history had a different mode, (5) added file on other side of
history is a binary.)  Do we just use multiple conflict headers?

Third, what about the cases where there is no diff, just conflict
headers?  (I suspect many modify/delete or rename/delete or binary
files may end up in such a situation.)

I don't think any of those are deal breakers, but it means more work,
and maybe also other forms of ugliness.

>      --- a/foo.c
>      +++ b/foo.c
>      @@ some actual diff starts here @@
>
> Obviously such a thing can't really be applied. But then you wouldn't
> want to apply the addition of "my.file e.conflict_msg" either.

Nit: "my.fil e.conflict_msg", not "my.file e.conflict_msg" (the 'e' in
'file' is not repeated, otherwise the auxiliary file wouldn't sort
before its companion file)

> I dunno. The latter especially is definitely more work, and requires a
> bit more cooperation between the merge and diff code. In particular, you
> can't just feed a straight tree to the diff anymore. We have to hold
> back the annotations, and then apply them to the resulting diff. But I
> think the output is much more pleasing to the eye.

It's certainly an interesting idea.  It's a lot more work, it involves
the inability to feed a straight tree to a diff would require piping
things through several different layers (merge -> log -> diff, and
possible multiple diff layers), it may mean we need special handling
for when there are only conflict headers for a file with no file
differences, the length of the conflict headers could be comically
long, and it's all essentially for what is a rather uncommon case
anyway.  But, on the plus side, it does avoid the rather ugly space.

I'll have to think about it.