Re: [PATCH 0/3] [GSOC][RFC] ref-filter: add contents:raw atom

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> 于2021年5月26日周三 下午3:06写道:
>
> ZheNing Hu <adlternative@xxxxxxxxx> writes:
>
> > $ printf '%b' "name='a\\\0b\\\0c'\necho -e \"\$name\"" | sh | od -c
> > 0000000   a  \0   b  \0   c  \n
> > 0000006
>
> This is wrong.  In the above, the variable name does not have a NUL
> in it.  It insead has 2-byte sequence "\" and "0" in it, and you are
> letting "echo -e" to convert it into binary, which is not portable
> at all.
>

Indeed I was wrong, the var name does not contain '\0'.

> I'd suggest instead to declare that some host languages, like shell,
> are not binary-clean and either refuse to process atoms whose values
> have NUL in them.  Silently truncating strings at NUL or striping
> NULs in strings are also acceptable options if clearly documented.
> Claiming that we stuff binaries into variables of the host language,
> while not doing so and instead assigning a quoted form, is not good.
>

Makes sense. Either choose to truncate, or choose to reject.

> I have not thought about Python3 very much.  For the purpose of most
> %(placeholders), it is vastly more preferrable to use str (i.e. text
> sequence type) as opposed to bytes, as you do not have to .decode()
> to use the resulting "string", but even for things like %(refname),
> it is not technically kosher to assume that the contents are UTF-8
> encoded text, as filenames used to represent refnames are merely a
> sequence of bytes except NUL, but for consistency with binary gunk,
> we might have to emit everything as bytes.  I dunno.
>
> > In shell or python2/3, we can replace'\0' with "\\0".
>
> Not for shell.  We should declare that it is not supported to feed
> binary to shell.

Eh, it seems that we adopt a "reject" strategy.

$ git hash-object a.out -w | xargs git update-ref refs/myblobs/aoutblob
$ git for-each-ref --format="name=%(raw)" refs/myblobs/aoutblob
--python | python2
  File "<stdin>", line 1
SyntaxError: Non-ASCII character '\x8b' in file <stdin> on line 2, but
no encoding declared;
 see http://python.org/dev/peps/pep-0263/ for details

$ git for-each-ref --format="name=%(raw)" refs/myblobs/aoutblob
--python |python3
SyntaxError: Non-UTF-8 code starting with '\x8b' in file <stdin> on
line 2, but no encoding declared;
 see http://python.org/dev/peps/pep-0263/ for details

It seems that --python also needs to "reject", no matter python2 or python3.
What about tcl and perl?

$ cat a.out | od >1.od
$ git for-each-ref --format="set name %(raw)
puts -nonewline \$name" refs/myblobs/aoutblob --tcl | tclsh | od > 2.od
$ diff 1.od 2.od | head
7,12c7,12
< 0000140 114303 000002 000000 000000 141400 001230 000000 000000
< 0000160 000000 000010 000000 000000 000000 000003 000000 000004
< 0000200 000000 001430 000000 000000 000000 001430 000000 000000
< 0000220 000000 001430 000000 000000 000000 000034 000000 000000
< 0000240 000000 000034 000000 000000 000000 000001 000000 000000
< 0000260 000000 000001 000000 000004 000000 000000 000000 000000
---
> 0000140 001330 000000 000000 000000 001330 000000 000000 000000
> 0000160 000010 000000 000000 000000 000003 000000 000004 000000
> 0000200 001430 000000 000000 000000 001430 000000 000000 000000
> 0000220 001430 000000 000000 000000 000034 000000 000000 000000
> 0000240 000034 000000 000000 000000 000001 000000 000000 000000
> 0000260 000001 000000 000004 000000 000000 000000 000000 000000

It seems that a.out contents passed into tcl and then the output is
very different...

But,

$ cat a.out | od >1.od
$ git for-each-ref --format="\$name= %(raw);
print \"\$name\"" refs/myblobs/aoutblob --perl | perl | od >6.od
$ diff 1.od 2.od

There was no error this time, so for perl, it's ok...
The "binary security" we care about is currently only complied with
by the Perl language.

So I think we better reject them all languages together for normative.
The clear definition of this rejection strategy is that %(raw) and --language
cannot be used at the same time. If our binary data is passed to a variable
in the host language, there may be escape errors for the host language.

Thanks.
--
ZheNing Hu




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux