Re: machine-parsable git-merge-tree messages (was: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function)

Elijah Newren <newren@xxxxxxxxx> · Mon, 28 Feb 2022 19:49:20 -0800

On Mon, Feb 28, 2022 at 1:27 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Tue, Feb 22 2022, Elijah Newren wrote:
>
[...]
> > I don't see how this helps solve the problem Dscho was bringing up at
> > all.  Your reference to "the path" means you've missed his whole
> > complaint -- that with more complex conflicts (renames, directory/file
> > conflicts resolved via moving the file out of the way, mode conflicts
> > resolved by moving both files out of the way, etc) there are multiple
> > paths involved and he's trying to determine what those paths are.
> > He's particularly focusing on rename/rename cases where a single path
> > was renamed differently by the two sides of history (which results in
> > a conflict message only being associated with the path from the merge
> > base in order to avoid repeating the same message 2-3 times, but that
> > one message has three distinct paths embedded in the string).
> >
> > Also, the additional paths is not part of the API to path_msg; it's
> > merely embedded in a string.  (And, in case it bears repeating: as
> > mentioned elsewhere, we cannot assume there will only be one
> > path_msg() call per path, and we at least currently can't assume that
> > each path_msg() call is for a separate logical conflict; there might
> > be two for a single "conflict".)
> >
> > I agree that parsing these meant-for-human-consumption (and not
> > promised to be stable) messages is not a good way to go, but
> > pretending the current API has enough info to answer his questions
> > isn't right either.
>
> The intent here wasn't to present a complete solution, but to reply to
> the part of Johannes's E-Mail that e.g. mention "and I would be loathe
> to switch _all_ callers to do the quoting themselves.".
>
> I.e. it's a POC for passing this data further up the stack. The issue
> you mention with the renaming case could/should be handled by having
> whatever handles the vargs accept those N arguments, the POC doesn't
> handle it.
>
> But in any case, needing to convert "28 calls to [path_msg()]" doesn't
> seem like it's required.

The problem isn't that it's an incomplete solution, it's that AFAICT,
the user's stated problem is not aided at all by this POC.  Passing
existing data further up the stack cannot solve the problem if the
data being passed is insufficient for the problem at hand.

Perhaps you have some clever solution to get the extra information,
though?  Could you elaborate on how path_msg() could handle its
varargs differently to deduce which of those correspond to paths?  The
only way I can see how to do that, short of modifying all 28 callers
of path_msg() to pass those paths as additional information, is hacks
like parsing the (non-stable, not-meant-for-machine-parseability)
format string.

(Getting the paths would get us most of the way to a solution, though
it's still incomplete.  But it's the relevant bit under discussion
here.)

> But obviously we wouldn't want to use trace2 as a plumbing layer for
> message passing, but could format the same data in a similar way,
> especially in the context of a discussion about filenames with odd
> characters in them (some of which JSON is inherently incapable of
> encoding).
>
> >> I think that would be particularly useful in conjuction with the
> >> --format changes I was proposing for this (and hacked up a WIP patch
> >> for). You could just have a similar --format-messages or whatever.
> >>
> >> Then you could pick \0\0 as a delimiter for your "main" --format, and
> >> "\0" as the delimiter for your --format-messages, and thus be able to
> >> parse N-nested \0-delimited content.
> >
> > To be honest, the --format stuff is sounding a little bit like a
> > solution in search of a problem.
>
> Opinions on this obviously differ, and I'm not going to pick this as my
> particular hill to die on :)
>
> But I do think it's the other way around, in that hardcoded output
> formats are a problem requiring solutions.

I might be more convinced if folks tried to address how to output
things _after_ we had determined *what* things should be output.  If
we don't have sufficient information to solve what users want,
discussing how we format the information we do have cannot help solve
the actual problems.  It might be useful as an add-on later, but
discussing it first is putting the cart before the horse.