Junio C Hamano <gitster@xxxxxxxxx> 于2021年5月26日周三 下午3:06写道: > > ZheNing Hu <adlternative@xxxxxxxxx> writes: > > > $ printf '%b' "name='a\\\0b\\\0c'\necho -e \"\$name\"" | sh | od -c > > 0000000 a \0 b \0 c \n > > 0000006 > > This is wrong. In the above, the variable name does not have a NUL > in it. It insead has 2-byte sequence "\" and "0" in it, and you are > letting "echo -e" to convert it into binary, which is not portable > at all. > Indeed I was wrong, the var name does not contain '\0'. > I'd suggest instead to declare that some host languages, like shell, > are not binary-clean and either refuse to process atoms whose values > have NUL in them. Silently truncating strings at NUL or striping > NULs in strings are also acceptable options if clearly documented. > Claiming that we stuff binaries into variables of the host language, > while not doing so and instead assigning a quoted form, is not good. > Makes sense. Either choose to truncate, or choose to reject. > I have not thought about Python3 very much. For the purpose of most > %(placeholders), it is vastly more preferrable to use str (i.e. text > sequence type) as opposed to bytes, as you do not have to .decode() > to use the resulting "string", but even for things like %(refname), > it is not technically kosher to assume that the contents are UTF-8 > encoded text, as filenames used to represent refnames are merely a > sequence of bytes except NUL, but for consistency with binary gunk, > we might have to emit everything as bytes. I dunno. > > > In shell or python2/3, we can replace'\0' with "\\0". > > Not for shell. We should declare that it is not supported to feed > binary to shell. Eh, it seems that we adopt a "reject" strategy. $ git hash-object a.out -w | xargs git update-ref refs/myblobs/aoutblob $ git for-each-ref --format="name=%(raw)" refs/myblobs/aoutblob --python | python2 File "<stdin>", line 1 SyntaxError: Non-ASCII character '\x8b' in file <stdin> on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details $ git for-each-ref --format="name=%(raw)" refs/myblobs/aoutblob --python |python3 SyntaxError: Non-UTF-8 code starting with '\x8b' in file <stdin> on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details It seems that --python also needs to "reject", no matter python2 or python3. What about tcl and perl? $ cat a.out | od >1.od $ git for-each-ref --format="set name %(raw) puts -nonewline \$name" refs/myblobs/aoutblob --tcl | tclsh | od > 2.od $ diff 1.od 2.od | head 7,12c7,12 < 0000140 114303 000002 000000 000000 141400 001230 000000 000000 < 0000160 000000 000010 000000 000000 000000 000003 000000 000004 < 0000200 000000 001430 000000 000000 000000 001430 000000 000000 < 0000220 000000 001430 000000 000000 000000 000034 000000 000000 < 0000240 000000 000034 000000 000000 000000 000001 000000 000000 < 0000260 000000 000001 000000 000004 000000 000000 000000 000000 --- > 0000140 001330 000000 000000 000000 001330 000000 000000 000000 > 0000160 000010 000000 000000 000000 000003 000000 000004 000000 > 0000200 001430 000000 000000 000000 001430 000000 000000 000000 > 0000220 001430 000000 000000 000000 000034 000000 000000 000000 > 0000240 000034 000000 000000 000000 000001 000000 000000 000000 > 0000260 000001 000000 000004 000000 000000 000000 000000 000000 It seems that a.out contents passed into tcl and then the output is very different... But, $ cat a.out | od >1.od $ git for-each-ref --format="\$name= %(raw); print \"\$name\"" refs/myblobs/aoutblob --perl | perl | od >6.od $ diff 1.od 2.od There was no error this time, so for perl, it's ok... The "binary security" we care about is currently only complied with by the Perl language. So I think we better reject them all languages together for normative. The clear definition of this rejection strategy is that %(raw) and --language cannot be used at the same time. If our binary data is passed to a variable in the host language, there may be escape errors for the host language. Thanks. -- ZheNing Hu