Re: [PATCH 1/1] cat-file: quote-format name in error when using -z

Toon Claes <toon@to1.studio> · Mon, 12 Dec 2022 12:34:46 +0100

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Phillip Wood <phillip.wood123@xxxxxxxxx> writes:
>
>> Without "-z" you cannot pass object names that contain newlines so not
>> quoting the output does not cause a problem. We could start quoting
>> the object name without "-z" but we'd be changing the output without a
>> huge benefit.
>
> That's fair.  The next question is from a devil's advocate:
> is switching to the full cquote the best thing to do?
>
> If we were using the full cquote from the very beginning, of course
> it is, simply because that is what is used in all other places in
> Git.  Using the full cquote does mean a LF byte will be protected
> (i.e. instead of shown literally in the middle of other letters
> around LF, "other\nletters around LF" would be shown), but pathnames
> with backslashes and double quotes in them that have been shown
> without problems would be shown differently and will break existing
> parsers, which are written lazily with the assumption that they are
> perfectly happy to be "simple" thans to not having to deal with LF
> (because in their environment a path with LF in it do not matter).
>
> A bit safer thing to do is to replace LF (and not any other bytes
> that would be quoted with full cquote) in the path given in these
> messages with something else (like NUL to truncate the output
> there).

So object name "HEAD:other\nletters around LF" would give the error
message "other missing"? That error message would also occur when the
user does not provide -z. I think that might be confusing.

> As these answers are given in order, the object names are
> not absolutely needed to identify and match up the input and the
> output, and properly written parsers would be prepared to see a
> response with an object name that it did not request and handle it
> sanely, such a change may not break such a parser for a path with
> any byte that are modified with full cquote.

Yes, the answers are returned in order, so personally I don't care too
much about the returned object name format. I even would be fine with a
generic error message that omits the input name, for example "object
missing". As long as it's clear that the requested object is not found.

For your information, there is an extreme edge case a user could fake an
object, and that's what we want to avoid as well. For example the
command (line break included):

printf "aabbccddeeff00112233445566778899aabbccdd blob 26
this object is not" | git cat-file --batch -z

Would print:

aabbccddeeff00112233445566778899aabbccdd blob 26
this object is not missing

This is perfectly valid git-cat-file output. Luckily I don't see any way
how this can be abused. Generally I think it's a good idea to not return
the input as-is in any situation. We could only replace newlines, but
cquoting already sanitizes the input, so why not use that?

> The above is with a devil's adovocate hat on, and I do not care too
> much, as I do not think butchering backslash with full cquote would
> not hurt even existing Windows users (if "HEAD:t\README.txt" named
> the same blob as "HEAD:t/README.txt" on a platform, doubling the
> backslashes in the output would have made quite a lot of damage, but
> I do not think we allow backslashes to name tree paths).

> By the way, there is another use of obj_name in batch_object_write()
> that can show whatever byte in it literally to the output.

Ah thanks! I will include in the next version, when we reach a consensus
on when or what to cquote.

--
Toon