Re: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function

Elijah Newren <newren@xxxxxxxxx> · Fri, 25 Feb 2022 22:53:28 -0800

Hi Dscho,

On Fri, Feb 25, 2022 at 8:31 AM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Elijah,
>
> On Tue, 22 Feb 2022, Elijah Newren wrote:
>
> > Sidenote: Do you lump in binary merge conflicts with "non-semantic
> > merge conflicts"?  You would by your definition, but I'm not sure it
> > matches.
> >
> > I tend to call things either content-based conflicts or path-based
> > conflicts, where content-based usually means textual-based but also
> > includes merges of binaries.
>
> I like "content-based conflicts".
>
> And no, I had not even thought about binary merge conflicts yet...
>
> > On Mon, Feb 21, 2022 at 2:46 AM Johannes Schindelin
> > <Johannes.Schindelin@xxxxxx> wrote:
> > >
> > > Concretely: while I am not currently aware of any web UI that allows
> > > to resolve simple rename/rename conflicts, it is easily conceivable
> > > how to implement such a thing. When that happens, we will need to be
> > > able to teach the server-side code to discern between the cases that
> > > can be handled in the web UI (trivial merge conflicts, trivial
> > > rename/rename conflicts) as compared to scenarios where the conflicts
> > > are just too complex.
> >
> > Um, I'm really worried about attempting to make the conflict notices
> > machine parseable.  I don't like that idea at all, and I even tried to
> > rule that out already with my wording:
> > """
> > In all cases, the
> > <Informational messages> section has the necessary info, though it is
> > not designed to be machine parseable.
> > """
> > though maybe I should have been even more explicit.  The restrictions
> > that those messages be stable is too rigid, I think.  I also think
> > they're a poor way to communicate information to a higher level tool.
> > I would much rather us add some kind of additional return data
> > structures from merge ort and use them if we want extra info.
>
> Okay.
>
> I thought that we could keep the `CONFLICT (<type>)` constant enough to
> serve as such a machine-parseable thing.

That _probably_ is, but I thought you wanted to parse all N paths
embedded in the message after that part as well?

> And then presenting
> `<path>NUL<message>NUL` could have served my use case well...

Would it?  Wouldn't you need something more like

<number-of-paths>NUL<path1>NUL<path2>NUL<pathN>NUL<stable-short-type-description>NUL<message>NUL
 ?

I mean, if rename/rename is what you want to handle, there are three
paths in that message.  And you need to know all three paths in order
to combine the relevant parts of the <Conflicted File Info> section
together.

(Also, while we're at it, I decided to throw a stable short-type
description string (e.g. "CONFLICT (rename/rename)") in there, which
will _probably_ be the first part of the message string but still
allow us to change the message string later if we want.)

Also, we'd want those parsing this information to keep in mind that:
  * Any given conflict can affect multiple paths
  * Any path can be part of multiple conflicts
  * (The above two items imply a potentially many-to-many relationship
between paths and conflicts)
  * Paths listed in these logical conflicts may not correspond to a
file in the index (they could be a directory, or file that was in a
previous version)
  * Some of these "logical conflicts" are not actually conflicts but
just notices (e.g. "auto-merging" or "submodule updated" or "WARNING"
or "<submodules are weird>" messages)

and we'd have to do some work to make sure the paths in the given
messages lined up with the files actually recorded in the index (e.g.
with distinct types we rename both files to avoid the collision, but
print the conflict notice for the original path rather than the new
paths)

[...]
> > But I'm going to reserve the right in merge-ort to modify, add, or
> > delete any of those messages passed to path_msg(), which might wreak
> > havoc on your attempts to parse those strings.  I think they're a bad
> > form for communicating information to a script or program, and trying
> > to transform them into such risks making them suboptimal at
> > communicating info to humans.  These messages should optimize the
> > latter, and if we want something for the former, it should probably be
> > a new independent bit of info.
>
> Makes sense.
>
> So we need something in addition to those messages.

Yes.  Does the proposal above sound like it'd cover your needs?  If
so, we'd probably need to go through all the callers to path_msg() and
either add an immediate call to another function immediately
afterwards that stores this additional information or somehow change
the path_msg() call itself to somehow take an additional arbitrary
list of arguments representing the paths and short-desc we want to
store somewhere.