Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > On 12/05/2023 17:57, Junio C Hamano wrote: >> Toon Claes <toon@xxxxxxxxx> writes: >> Stepping back a bit, how big a problem is this in real life? It >> certainly is possible to create a pathname with funny byte values in >> it, and in some environments,letters like single-quote that are >> considered cumbersome to handle by those who are used to CLI >> programs may be commonplace. But a path with newline? Or any >> control character for that matter? And this is not even the primary >> output from the program but is an error message for consumption by >> humans, no? >> I am wondering if it is simpler to just declare that the paths >> output in error messages have certain bytes, probably all control >> characters other than HT, replaced with a dot, and tell the users >> not to rely on the pathnames being intact if they contain funny >> bytes in them. > > We could only c-quote the name when it contains a control character > other that HT. That way names containing double quotes and backslashes > are unchanged but it will still be possible to parse the path from the > error message. If we're going to munge the name we might as well use > our standard quoting rather than some ad-hoc scheme. In the above suggestion, I gave up and no longer aim to do "quoting". A more appropriate word for the approach is "redacting". The message essentially is: If you use truly problematic bytes in your path, they are redacted (so do not use them if it hurts). This is because I am not sure how "names containing dq and bs are unchanged" can be done without ambiguity. If I see a message that comes out of this: printf("%s missing\n", obj_name); and it looks like "a\nb" missing how do I tell if it is complaining about the object the user named with a three-byte string (i.e. lowercase-A, newline, lowercase-B), or a six-byte string (i.e. dq, lowercase-A, bs, lowercase-N, lowercase-B, dq)? If we were forbidding '"' to appear in a refname, then we could take advantage of the fact that the name of an object inside a tree at a funny path would not start with '"', to disambiguate. For the three- and six-byte string cases above, the formatting function will give these messages (referred to as "sample output" below): "master:a\nb" missing master:"a\nb" missing because of your "we do not exactly do our standard c-quote; we exempt dq and bs from the bytes to be quoted" rule. But it still feels a bit misleading. This codepath may have the whole objectname as a single string so that c-quoting the entire "<commit> <colon> <path>" inside a single c-quoted string that begins with a dq is easy, but not all codepaths are lucky and some may have to show <commit> and <path> separately, concatenated with <colon> at the outermost output layer, which means that the second one from the sample output may still mean the path with three-byte name in the tree of 'master' commit. And worse yet, because git branch '"master' is possible (even though nobody sane would do that), so "treat the string as c-quoted only if the object name as a whole begins with a dq", this disambiguation idea would not work. The first one from the sample output could be the blob at the path with a five-byte string name (i.e. lowercase-A, bs, lowercase-N, lowercase-B, dq) in the tree of the commit at the tip of branch with seven-byte string name (i.e. dq followed by 'master'). So, I dunno.