Toon Claes <toon@xxxxxxxxx> writes: > Since it's supported to have NUL-delimited input, introduced in > db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`, > 2022-07-22), it's possible to pass paths that contain newlines. It has been a while since I saw this patch the last time, and it did not immediately click to me how "pass paths" relates to passing object names, which is what "--batch-check" takes. Perhaps "cat-file --batch-check" may be fed object names of blobs and trees as "<commit>:<path>", and with its "-z" option, it is possible to feed <commit> or <path> with newlines in it. or something? > This > works great when the object is found, but when it's not, the input path > is returned in the error message. Because this can contain newlines, the > error message might get spread over multiple lines, making it harder to > machine-parse this error message. Good description. I may suggest "the input path is returned" -> "the input is shown verbatim" because <path> is not the only thing that can contain LF. E.g. $ git show -s 'HEAD^{/introduced in > db9d67}' can find the commit resulting from this patch, so $ printf "%s\0" 'HEAD^{/introduced in > db9d67}:builtin/cat-file.cc' | > git cat-file --batch-check -z would be an input record with newline in it, that has no newline in the path. > With this change, the input is quote-formatted in the error message, if > needed. This ensures the error message is always on a single line and > makes parsing the error more straightforward. Drop "With this change, ..." and give a command to the codebase to c-quote the object name in the output, e.g. C-quote the object name from the input in the error message as needed, to ensure that the error message is on a single line and ... The other side of the coin, however, is that an existing project that is sane enough not to have a path with LF in it, but is not sane enough to avoid a path with double-quote in it, would now stop being able to parse the error message for a missing path. It is a regression, and we may argue that it is not a large enough regression to block progress given by this patch, but if it is not common enough to have funny characters in the paths then we wouldn't be seeing this patch in the first place. So I would prefer to see that we at least admit that we are deliberately making this change with a known regression. Perhaps add something like Note that if a project already parses the error message correctly because it does not have paths with newlines, this is a breaking change if it has paths with characters that need c-quoting (like double quotes and backslashes) that are not newlines. We consider that this is a small enough price to pay to allow newlines in paths because ... and fill the "because ..." part is sensible. I am not filling the "because" part, simply because I do not offhand see any good excuse to rob Peter to pay Paul in this case. > @@ -470,8 +471,17 @@ static void batch_object_write(const char *obj_name, > &data->oid, &data->info, > OBJECT_INFO_LOOKUP_REPLACE); > if (ret < 0) { > + struct strbuf quoted = STRBUF_INIT; > + > + if (opt->nul_terminated && > + obj_name) { > + quote_c_style(obj_name, "ed, NULL, 0); > + obj_name = quoted.buf; > + } > + > printf("%s missing\n", > obj_name ? obj_name : oid_to_hex(&data->oid)); > + strbuf_release("ed); > fflush(stdout); > return; > } Interesting. When running "--batch-all-objects", we do not have obj_name, and we do not have anything to "quote", but the fallback label is the full hexadecimal object name, and we do not have to worry about line breaks. OK. > @@ -518,6 +528,13 @@ static void batch_one_object(const char *obj_name, > result = get_oid_with_context(the_repository, obj_name, > flags, &data->oid, &ctx); > if (result != FOUND) { > + struct strbuf quoted = STRBUF_INIT; > + > + if (opt->nul_terminated) { > + quote_c_style(obj_name, "ed, NULL, 0); > + obj_name = quoted.buf; > + } > + > switch (result) { > case MISSING_OBJECT: > printf("%s missing\n", obj_name); > @@ -542,6 +559,8 @@ static void batch_one_object(const char *obj_name, > result); > break; > } > + > + strbuf_release("ed); > fflush(stdout); > return; > } This is the side that runs without "--batch-all-objects" and always has non-null obj_name, so it looks a bit different from the other hunk and is more straight-forward. Looking good. Thanks.