machine-parsable git-merge-tree messages (was: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 21 2022, Johannes Schindelin wrote:

[I sent out an empty reply to this earlier by mistake, sorry about that]

> [...]
> Which brings me to the next concern: since `merge-tree` is a low-level
> tool meant to be called by programs rather than humans, we need to make
> sure that those messages remain machine-parseable, even if they contain
> file names.
>
> Concretely: while I am not currently aware of any web UI that allows to
> resolve simple rename/rename conflicts, it is easily conceivable how to
> implement such a thing. When that happens, we will need to be able to
> teach the server-side code to discern between the cases that can be
> handled in the web UI (trivial merge conflicts, trivial rename/rename
> conflicts) as compared to scenarios where the conflicts are just too
> complex.
>
> Here's an excerpt from t4301:
>
> -- snip --
> Auto-merging greeting
> CONFLICT (content): Merge conflict in greeting
> Auto-merging numbers
> CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead.
> CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1.  Version side1 of whatever~side1 left in tree.
> -- snap --
>
> This is the complete set of messages provided in the `test conflict
> notices and such` test case.
>
> I immediately notice that every line contains at least one file name.
> Looking at https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1899, it
> does not seem as if the file names are quoted:
>
> 		path_msg(opt, path, 1, _("Auto-merging %s"), path);
>
> (where `path` is used verbatim in a call to `merge_3way()` before that,
> i.e. it must not have been quoted)
>
> I would like to register a wish to ensure that file names with special
> characters (such as most notably line-feed characters) are quoted in these
> messages, so that a simple server-side parser can handle messages starting
> with `Auto-merging` and with `CONFLICT (content): Merge conflict in `, and
> "throw the hands up in the air" if any other message prefix is seen.
>
> Do you think we can switch to `sq_quote_buf_pretty()` for these messages?
> For the `Auto-merging` one, it would be trivial, but I fear that we will
> have to work a bit on the `path_msg()` function
> (https://github.com/git/git/blob/v2.35.1/merge-ort.c#L630-L649) because it
> accepts a variable list of arguments without any clue whether the
> arguments refer to paths or not. (And I would be loathe to switch _all_
> callers to do the quoting themselves.)
>
> I see 28 calls to that function, and at least a couple that pass not only
> a path but also an OID (e.g.
> https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1611-L1613).
>
> We could of course be sloppy and pass even OIDs through
> `sq_quote_buf_pretty()` in `path_msg()`, knowing that there won't be any
> special characters in them, but it gets more complicated e.g. in
> https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1648-L1651, where we
> pass an `strbuf` that contains a somewhat free-form commit message.
>
> I guess we could still pass those through `sq_quote_buf_pretty()`, even if
> they are not paths, to ensure that there are no special characters in the
> machine-parseable lines.
>
> What do you think?

That sounds like a rather nasty hack, this is too, but demonstrates that
we can pretty easily extract this in a machine-readable format with just
a few lines now):

diff --git a/merge-ort.c b/merge-ort.c
index 8a5f201d190..a906881f9b3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -633,7 +633,7 @@ static void path_msg(struct merge_options *opt,
 		     int omittable_hint, /* skippable under --remerge-diff */
 		     const char *fmt, ...)
 {
-	va_list ap;
+	va_list ap, cp;
 	struct strbuf *sb, *dest;
 	struct strbuf tmp = STRBUF_INIT;
 
@@ -650,7 +650,9 @@ static void path_msg(struct merge_options *opt,
 
 	dest = (opt->record_conflict_msgs_as_headers ? &tmp : sb);
 
+	va_copy(cp, ap);
 	va_start(ap, fmt);
+
 	if (opt->priv->call_depth) {
 		strbuf_addchars(dest, ' ', 2);
 		strbuf_addstr(dest, "From inner merge:");
@@ -659,6 +661,15 @@ static void path_msg(struct merge_options *opt,
 	strbuf_vaddf(dest, fmt, ap);
 	va_end(ap);
 
+	va_start(cp, fmt);
+	trace2_region_enter_printf("merge", "conflict/path", opt->repo, "%s", path);
+	trace2_region_leave("merge", "conflict/path", opt->repo);
+	trace2_region_enter_printf("merge", "conflict/fmt", opt->repo, "%s", fmt);
+	trace2_region_leave("merge", "conflict/fmt", opt->repo);
+	trace2_region_enter_printf_va("merge", "conflict/msg", opt->repo, fmt, cp);
+	trace2_region_leave("merge", "conflict/msg", opt->repo);
+	va_end(cp);
+
 	if (opt->record_conflict_msgs_as_headers) {
 		int i_sb = 0, i_tmp = 0;
 
You can run that with one of the tests added in this series to get the
output as JSON, e.g.:

     GIT_TRACE2_EVENT=/dev/stderr GIT_TRACE2_EVENT_NESTING=10 ~/g/git/git merge-tree --write-tree --no-messages --name-only --messages side1 side2 2>&1|jq -r .| grep '"msg"'
      "msg": "whatever~side1"
      "msg": "CONFLICT (file/directory): directory in the way of %s from %s; moving it to %s instead."
      "msg": "CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead."
      "msg": "whatever~side1"
      "msg": "CONFLICT (modify/delete): %s deleted in %s and modified in %s.  Version %s of %s left in tree."
      "msg": "CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1.  Version side1 of whatever~side1 left in tree."
      "msg": "numbers"
      "msg": "Auto-merging %s"
      "msg": "Auto-merging numbers"
      "msg": "greeting"
      "msg": "Auto-merging %s"
      "msg": "Auto-merging greeting"
      "msg": "greeting"
      "msg": "CONFLICT (%s): Merge conflict in %s"
      "msg": "CONFLICT (content): Merge conflict in greeting"

A "proper" fix for this doesn't sound too hard, we'd just instrument the
path_msg() function to pass along some "message category", see
e.g. unpack_plumbing_errors in unpack-trees.c for one example of such a
thing, or the "enum fsck_msg_id".

Then we'd just allow you to emit any of the sprintf() format itself, or
the expanded version, the path, or an id like "CONFLICT:file/directory"
or "auto-merging" etc.

I think that would be particularly useful in conjuction with the
--format changes I was proposing for this (and hacked up a WIP patch
for). You could just have a similar --format-messages or whatever.

Then you could pick \0\0 as a delimiter for your "main" --format, and
"\0" as the delimiter for your --format-messages, and thus be able to
parse N-nested \0-delimited content.








[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux