Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 于2021年6月17日周四 下午3:16写道: > > > The raw data of blob, tree objects may contain '\0', but most of > > the logic in `ref-filter` depends on the output of the atom being > > text (specifically, no embedded NULs in it). > > > > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` > > add the data to the buffer. The raw data of a tree object is > > `100644 one\0...`, only the `100644 one` will be added to the buffer, > > which is incorrect. > > > > Therefore, add a new member in `struct atom_value`: `s_size`, which > > can record raw object size, it can help us add raw object data to > > the buffer or compare two buffers which contain raw object data. > > Most of the functions that deal with this already use a strbuf in some > way, before we had a const char *, now there's a size_t to go along with > it, why not simply use a strbuf in the struct for the data? You'll then > get the size and \0 handling for free, and any functions to deal with > conversion can stick to the strbuf API, there seems to be a lot of back > and forth now. > Yes, strbuf is a suitable choice when using <str,len> pair. But if replace v->s with strbuf, the possible changes will be larger. > > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, > > `--tcl`, `--perl` because if our binary raw data is passed to a variable > > in the host language, the host language may not support arbitrary binary > > data in the variables of its string type. > > Perl at least deals with that just fine, and to the extent that it > doesn't any new problems here would have nothing to do with \0 being in > the data. Perl doesn't have a notion of "binary has \0 in it", it always > supports \0, it has a notion of "is it utf-8 or not?", so any encoding > problems wouldn't be new. I'd think that the same would be true of > Python, but I'm not sure. > Not python safe. See [1]. Regarding the perl language, I support Junio's point of view: it can be re-supported in the future. > > > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' > > + git cat-file commit refs/tags/testtag^{} >expected && > > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && > > + sanitize_pgp <expected >expected.clean && > > + sanitize_pgp <actual >actual.clean && > > + echo "" >>expected.clean && > > Just "echo" will do, ditto for the rest. Also odd to go back and forth > between populating expected.clean & actual.clean. > Are you saying that sanitize_pgp is not needed? > > > +test_expect_success 'set up refs pointing to binary blob' ' > > + printf "a\0b\0c" >blob1 && > > + printf "a\0c\0b" >blob2 && > > + printf "\0a\0b\0c" >blob3 && > > + printf "abc" >blob4 && > > + printf "\0 \0 \0 " >blob5 && > > + printf "\0 \0a\0 " >blob6 && > > + printf " " >blob7 && > > + >blob8 && > > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && > > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && > > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && > > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && > > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && > > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && > > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && > > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 > > Hrm, xargs just to avoid: > > git update-ref ... $(git hash-object) ? > I didn’t think about it, just for convenience. > > +test_expect_success '%(raw) with --python must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --python > > +' > > + > > +test_expect_success '%(raw) with --tcl must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --tcl > > +' > > + > > +test_expect_success '%(raw) with --perl must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --perl > > +' > > + > > +test_expect_success '%(raw) with --shell must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --shell > > +' > > + > > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' > > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell > > +' > > s/must failed/must fail/, but see question above about encoding in these > languages... [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@xxxxxxxxxxxxxx/ Thanks for a review. -- ZheNing Hu