Re: machine-parsable git-merge-tree messages (was: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 21, 2022 at 6:37 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Mon, Feb 21 2022, Johannes Schindelin wrote:
>
> [I sent out an empty reply to this earlier by mistake, sorry about that]
>
> > [...]
> > Which brings me to the next concern: since `merge-tree` is a low-level
> > tool meant to be called by programs rather than humans, we need to make
> > sure that those messages remain machine-parseable, even if they contain
> > file names.
> >
> > Concretely: while I am not currently aware of any web UI that allows to
> > resolve simple rename/rename conflicts, it is easily conceivable how to
> > implement such a thing. When that happens, we will need to be able to
> > teach the server-side code to discern between the cases that can be
> > handled in the web UI (trivial merge conflicts, trivial rename/rename
> > conflicts) as compared to scenarios where the conflicts are just too
> > complex.
> >
> > Here's an excerpt from t4301:
> >
> > -- snip --
> > Auto-merging greeting
> > CONFLICT (content): Merge conflict in greeting
> > Auto-merging numbers
> > CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead.
> > CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1.  Version side1 of whatever~side1 left in tree.
> > -- snap --
> >
> > This is the complete set of messages provided in the `test conflict
> > notices and such` test case.
> >
> > I immediately notice that every line contains at least one file name.
> > Looking at https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1899, it
> > does not seem as if the file names are quoted:
> >
> >               path_msg(opt, path, 1, _("Auto-merging %s"), path);
> >
> > (where `path` is used verbatim in a call to `merge_3way()` before that,
> > i.e. it must not have been quoted)
> >
> > I would like to register a wish to ensure that file names with special
> > characters (such as most notably line-feed characters) are quoted in these
> > messages, so that a simple server-side parser can handle messages starting
> > with `Auto-merging` and with `CONFLICT (content): Merge conflict in `, and
> > "throw the hands up in the air" if any other message prefix is seen.
> >
> > Do you think we can switch to `sq_quote_buf_pretty()` for these messages?
> > For the `Auto-merging` one, it would be trivial, but I fear that we will
> > have to work a bit on the `path_msg()` function
> > (https://github.com/git/git/blob/v2.35.1/merge-ort.c#L630-L649) because it
> > accepts a variable list of arguments without any clue whether the
> > arguments refer to paths or not. (And I would be loathe to switch _all_
> > callers to do the quoting themselves.)
> >
> > I see 28 calls to that function, and at least a couple that pass not only
> > a path but also an OID (e.g.
> > https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1611-L1613).
> >
> > We could of course be sloppy and pass even OIDs through
> > `sq_quote_buf_pretty()` in `path_msg()`, knowing that there won't be any
> > special characters in them, but it gets more complicated e.g. in
> > https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1648-L1651, where we
> > pass an `strbuf` that contains a somewhat free-form commit message.
> >
> > I guess we could still pass those through `sq_quote_buf_pretty()`, even if
> > they are not paths, to ensure that there are no special characters in the
> > machine-parseable lines.
> >
> > What do you think?
>
> That sounds like a rather nasty hack, this is too, but demonstrates that
> we can pretty easily extract this in a machine-readable format with just
> a few lines now):
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 8a5f201d190..a906881f9b3 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -633,7 +633,7 @@ static void path_msg(struct merge_options *opt,
>                      int omittable_hint, /* skippable under --remerge-diff */
>                      const char *fmt, ...)
>  {
> -       va_list ap;
> +       va_list ap, cp;
>         struct strbuf *sb, *dest;
>         struct strbuf tmp = STRBUF_INIT;
>
> @@ -650,7 +650,9 @@ static void path_msg(struct merge_options *opt,
>
>         dest = (opt->record_conflict_msgs_as_headers ? &tmp : sb);
>
> +       va_copy(cp, ap);
>         va_start(ap, fmt);
> +
>         if (opt->priv->call_depth) {
>                 strbuf_addchars(dest, ' ', 2);
>                 strbuf_addstr(dest, "From inner merge:");
> @@ -659,6 +661,15 @@ static void path_msg(struct merge_options *opt,
>         strbuf_vaddf(dest, fmt, ap);
>         va_end(ap);
>
> +       va_start(cp, fmt);
> +       trace2_region_enter_printf("merge", "conflict/path", opt->repo, "%s", path);
> +       trace2_region_leave("merge", "conflict/path", opt->repo);
> +       trace2_region_enter_printf("merge", "conflict/fmt", opt->repo, "%s", fmt);
> +       trace2_region_leave("merge", "conflict/fmt", opt->repo);
> +       trace2_region_enter_printf_va("merge", "conflict/msg", opt->repo, fmt, cp);
> +       trace2_region_leave("merge", "conflict/msg", opt->repo);
> +       va_end(cp);
> +
>         if (opt->record_conflict_msgs_as_headers) {
>                 int i_sb = 0, i_tmp = 0;
>
> You can run that with one of the tests added in this series to get the
> output as JSON, e.g.:
>
>      GIT_TRACE2_EVENT=/dev/stderr GIT_TRACE2_EVENT_NESTING=10 ~/g/git/git merge-tree --write-tree --no-messages --name-only --messages side1 side2 2>&1|jq -r .| grep '"msg"'
>       "msg": "whatever~side1"
>       "msg": "CONFLICT (file/directory): directory in the way of %s from %s; moving it to %s instead."
>       "msg": "CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead."
>       "msg": "whatever~side1"
>       "msg": "CONFLICT (modify/delete): %s deleted in %s and modified in %s.  Version %s of %s left in tree."
>       "msg": "CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1.  Version side1 of whatever~side1 left in tree."
>       "msg": "numbers"
>       "msg": "Auto-merging %s"
>       "msg": "Auto-merging numbers"
>       "msg": "greeting"
>       "msg": "Auto-merging %s"
>       "msg": "Auto-merging greeting"
>       "msg": "greeting"
>       "msg": "CONFLICT (%s): Merge conflict in %s"
>       "msg": "CONFLICT (content): Merge conflict in greeting"
>
> A "proper" fix for this doesn't sound too hard, we'd just instrument the
> path_msg() function to pass along some "message category", see
> e.g. unpack_plumbing_errors in unpack-trees.c for one example of such a
> thing, or the "enum fsck_msg_id".
>
> Then we'd just allow you to emit any of the sprintf() format itself, or
> the expanded version, the path, or an id like "CONFLICT:file/directory"
> or "auto-merging" etc.

I don't see how this helps solve the problem Dscho was bringing up at
all.  Your reference to "the path" means you've missed his whole
complaint -- that with more complex conflicts (renames, directory/file
conflicts resolved via moving the file out of the way, mode conflicts
resolved by moving both files out of the way, etc) there are multiple
paths involved and he's trying to determine what those paths are.
He's particularly focusing on rename/rename cases where a single path
was renamed differently by the two sides of history (which results in
a conflict message only being associated with the path from the merge
base in order to avoid repeating the same message 2-3 times, but that
one message has three distinct paths embedded in the string).

Also, the additional paths is not part of the API to path_msg; it's
merely embedded in a string.  (And, in case it bears repeating: as
mentioned elsewhere, we cannot assume there will only be one
path_msg() call per path, and we at least currently can't assume that
each path_msg() call is for a separate logical conflict; there might
be two for a single "conflict".)

I agree that parsing these meant-for-human-consumption (and not
promised to be stable) messages is not a good way to go, but
pretending the current API has enough info to answer his questions
isn't right either.

> I think that would be particularly useful in conjuction with the
> --format changes I was proposing for this (and hacked up a WIP patch
> for). You could just have a similar --format-messages or whatever.
>
> Then you could pick \0\0 as a delimiter for your "main" --format, and
> "\0" as the delimiter for your --format-messages, and thus be able to
> parse N-nested \0-delimited content.

To be honest, the --format stuff is sounding a little bit like a
solution in search of a problem.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux