Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Toon Claes <toon@xxxxxxxxx> writes:

> Since it's supported to have NUL-delimited input, introduced in
> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
> 2022-07-22), it's possible to pass paths that contain newlines.

It has been a while since I saw this patch the last time, and it did
not immediately click to me how "pass paths" relates to passing
object names, which is what "--batch-check" takes.  Perhaps

    "cat-file --batch-check" may be fed object names of blobs and
    trees as "<commit>:<path>", and with its "-z" option, it is
    possible to feed <commit> or <path> with newlines in it.

or something?

> This
> works great when the object is found, but when it's not, the input path
> is returned in the error message. Because this can contain newlines, the
> error message might get spread over multiple lines, making it harder to
> machine-parse this error message.

Good description.  I may suggest

    "the input path is returned" -> "the input is shown verbatim"

because <path> is not the only thing that can contain LF.  E.g.

    $ git show -s 'HEAD^{/introduced in
    > db9d67}'

can find the commit resulting from this patch, so

    $ printf "%s\0" 'HEAD^{/introduced in
    > db9d67}:builtin/cat-file.cc' |
    > git cat-file --batch-check -z 

would be an input record with newline in it, that has no newline in
the path.

> With this change, the input is quote-formatted in the error message, if
> needed. This ensures the error message is always on a single line and
> makes parsing the error more straightforward.

Drop "With this change, ..." and give a command to the codebase to
c-quote the object name in the output, e.g.

    C-quote the object name from the input in the error message as
    needed, to ensure that the error message is on a single line and
    ...

The other side of the coin, however, is that an existing project
that is sane enough not to have a path with LF in it, but is not
sane enough to avoid a path with double-quote in it, would now stop
being able to parse the error message for a missing path.

It is a regression, and we may argue that it is not a large enough
regression to block progress given by this patch, but if it is not
common enough to have funny characters in the paths then we wouldn't
be seeing this patch in the first place.  So I would prefer to see
that we at least admit that we are deliberately making this change
with a known regression.  Perhaps add something like

    Note that if a project already parses the error message
    correctly because it does not have paths with newlines, this is
    a breaking change if it has paths with characters that need
    c-quoting (like double quotes and backslashes) that are not
    newlines.  We consider that this is a small enough price to pay
    to allow newlines in paths because ...

and fill the "because ..." part is sensible.  I am not filling the
"because" part, simply because I do not offhand see any good excuse
to rob Peter to pay Paul in this case.

> @@ -470,8 +471,17 @@ static void batch_object_write(const char *obj_name,
>  						       &data->oid, &data->info,
>  						       OBJECT_INFO_LOOKUP_REPLACE);
>  		if (ret < 0) {
> +			struct strbuf quoted = STRBUF_INIT;
> +
> +			if (opt->nul_terminated &&
> +			    obj_name) {
> +				quote_c_style(obj_name, &quoted, NULL, 0);
> +				obj_name = quoted.buf;
> +			}
> +
>  			printf("%s missing\n",
>  			       obj_name ? obj_name : oid_to_hex(&data->oid));
> +			strbuf_release(&quoted);
>  			fflush(stdout);
>  			return;
>  		}

Interesting.

When running "--batch-all-objects", we do not have obj_name, and we
do not have anything to "quote", but the fallback label is the full
hexadecimal object name, and we do not have to worry about line
breaks.

OK.

> @@ -518,6 +528,13 @@ static void batch_one_object(const char *obj_name,
>  	result = get_oid_with_context(the_repository, obj_name,
>  				      flags, &data->oid, &ctx);
>  	if (result != FOUND) {
> +		struct strbuf quoted = STRBUF_INIT;
> +
> +		if (opt->nul_terminated) {
> +			quote_c_style(obj_name, &quoted, NULL, 0);
> +			obj_name = quoted.buf;
> +		}
> +
>  		switch (result) {
>  		case MISSING_OBJECT:
>  			printf("%s missing\n", obj_name);
> @@ -542,6 +559,8 @@ static void batch_one_object(const char *obj_name,
>  			       result);
>  			break;
>  		}
> +
> +		strbuf_release(&quoted);
>  		fflush(stdout);
>  		return;
>  	}

This is the side that runs without "--batch-all-objects" and always
has non-null obj_name, so it looks a bit different from the other
hunk and is more straight-forward.  Looking good.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux