On Mon, Feb 21 2022, Johannes Schindelin wrote: [I sent out an empty reply to this earlier by mistake, sorry about that] > [...] > Which brings me to the next concern: since `merge-tree` is a low-level > tool meant to be called by programs rather than humans, we need to make > sure that those messages remain machine-parseable, even if they contain > file names. > > Concretely: while I am not currently aware of any web UI that allows to > resolve simple rename/rename conflicts, it is easily conceivable how to > implement such a thing. When that happens, we will need to be able to > teach the server-side code to discern between the cases that can be > handled in the web UI (trivial merge conflicts, trivial rename/rename > conflicts) as compared to scenarios where the conflicts are just too > complex. > > Here's an excerpt from t4301: > > -- snip -- > Auto-merging greeting > CONFLICT (content): Merge conflict in greeting > Auto-merging numbers > CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead. > CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1. Version side1 of whatever~side1 left in tree. > -- snap -- > > This is the complete set of messages provided in the `test conflict > notices and such` test case. > > I immediately notice that every line contains at least one file name. > Looking at https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1899, it > does not seem as if the file names are quoted: > > path_msg(opt, path, 1, _("Auto-merging %s"), path); > > (where `path` is used verbatim in a call to `merge_3way()` before that, > i.e. it must not have been quoted) > > I would like to register a wish to ensure that file names with special > characters (such as most notably line-feed characters) are quoted in these > messages, so that a simple server-side parser can handle messages starting > with `Auto-merging` and with `CONFLICT (content): Merge conflict in `, and > "throw the hands up in the air" if any other message prefix is seen. > > Do you think we can switch to `sq_quote_buf_pretty()` for these messages? > For the `Auto-merging` one, it would be trivial, but I fear that we will > have to work a bit on the `path_msg()` function > (https://github.com/git/git/blob/v2.35.1/merge-ort.c#L630-L649) because it > accepts a variable list of arguments without any clue whether the > arguments refer to paths or not. (And I would be loathe to switch _all_ > callers to do the quoting themselves.) > > I see 28 calls to that function, and at least a couple that pass not only > a path but also an OID (e.g. > https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1611-L1613). > > We could of course be sloppy and pass even OIDs through > `sq_quote_buf_pretty()` in `path_msg()`, knowing that there won't be any > special characters in them, but it gets more complicated e.g. in > https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1648-L1651, where we > pass an `strbuf` that contains a somewhat free-form commit message. > > I guess we could still pass those through `sq_quote_buf_pretty()`, even if > they are not paths, to ensure that there are no special characters in the > machine-parseable lines. > > What do you think? That sounds like a rather nasty hack, this is too, but demonstrates that we can pretty easily extract this in a machine-readable format with just a few lines now): diff --git a/merge-ort.c b/merge-ort.c index 8a5f201d190..a906881f9b3 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -633,7 +633,7 @@ static void path_msg(struct merge_options *opt, int omittable_hint, /* skippable under --remerge-diff */ const char *fmt, ...) { - va_list ap; + va_list ap, cp; struct strbuf *sb, *dest; struct strbuf tmp = STRBUF_INIT; @@ -650,7 +650,9 @@ static void path_msg(struct merge_options *opt, dest = (opt->record_conflict_msgs_as_headers ? &tmp : sb); + va_copy(cp, ap); va_start(ap, fmt); + if (opt->priv->call_depth) { strbuf_addchars(dest, ' ', 2); strbuf_addstr(dest, "From inner merge:"); @@ -659,6 +661,15 @@ static void path_msg(struct merge_options *opt, strbuf_vaddf(dest, fmt, ap); va_end(ap); + va_start(cp, fmt); + trace2_region_enter_printf("merge", "conflict/path", opt->repo, "%s", path); + trace2_region_leave("merge", "conflict/path", opt->repo); + trace2_region_enter_printf("merge", "conflict/fmt", opt->repo, "%s", fmt); + trace2_region_leave("merge", "conflict/fmt", opt->repo); + trace2_region_enter_printf_va("merge", "conflict/msg", opt->repo, fmt, cp); + trace2_region_leave("merge", "conflict/msg", opt->repo); + va_end(cp); + if (opt->record_conflict_msgs_as_headers) { int i_sb = 0, i_tmp = 0; You can run that with one of the tests added in this series to get the output as JSON, e.g.: GIT_TRACE2_EVENT=/dev/stderr GIT_TRACE2_EVENT_NESTING=10 ~/g/git/git merge-tree --write-tree --no-messages --name-only --messages side1 side2 2>&1|jq -r .| grep '"msg"' "msg": "whatever~side1" "msg": "CONFLICT (file/directory): directory in the way of %s from %s; moving it to %s instead." "msg": "CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead." "msg": "whatever~side1" "msg": "CONFLICT (modify/delete): %s deleted in %s and modified in %s. Version %s of %s left in tree." "msg": "CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1. Version side1 of whatever~side1 left in tree." "msg": "numbers" "msg": "Auto-merging %s" "msg": "Auto-merging numbers" "msg": "greeting" "msg": "Auto-merging %s" "msg": "Auto-merging greeting" "msg": "greeting" "msg": "CONFLICT (%s): Merge conflict in %s" "msg": "CONFLICT (content): Merge conflict in greeting" A "proper" fix for this doesn't sound too hard, we'd just instrument the path_msg() function to pass along some "message category", see e.g. unpack_plumbing_errors in unpack-trees.c for one example of such a thing, or the "enum fsck_msg_id". Then we'd just allow you to emit any of the sprintf() format itself, or the expanded version, the path, or an id like "CONFLICT:file/directory" or "auto-merging" etc. I think that would be particularly useful in conjuction with the --format changes I was proposing for this (and hacked up a WIP patch for). You could just have a similar --format-messages or whatever. Then you could pick \0\0 as a delimiter for your "main" --format, and "\0" as the delimiter for your --format-messages, and thus be able to parse N-nested \0-delimited content.