On Thu, Jun 17 2021, ZheNing Hu wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 于2021年6月17日周四 下午3:16写道: >> >> > The raw data of blob, tree objects may contain '\0', but most of >> > the logic in `ref-filter` depends on the output of the atom being >> > text (specifically, no embedded NULs in it). >> > >> > E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()` >> > add the data to the buffer. The raw data of a tree object is >> > `100644 one\0...`, only the `100644 one` will be added to the buffer, >> > which is incorrect. >> > >> > Therefore, add a new member in `struct atom_value`: `s_size`, which >> > can record raw object size, it can help us add raw object data to >> > the buffer or compare two buffers which contain raw object data. >> >> Most of the functions that deal with this already use a strbuf in some >> way, before we had a const char *, now there's a size_t to go along with >> it, why not simply use a strbuf in the struct for the data? You'll then >> get the size and \0 handling for free, and any functions to deal with >> conversion can stick to the strbuf API, there seems to be a lot of back >> and forth now. >> > > Yes, strbuf is a suitable choice when using <str,len> pair. > But if replace v->s with strbuf, the possible changes will be larger. I for one would like to see it done that way, those changes are usually easy to read. Also it seems a large part of 2/8 is extra new code because we didn't do that, e.g. getting length differently if something is a strbuf or not, passing char*/size_t pairs to new functions etc. >> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`, >> > `--tcl`, `--perl` because if our binary raw data is passed to a variable >> > in the host language, the host language may not support arbitrary binary >> > data in the variables of its string type. >> >> Perl at least deals with that just fine, and to the extent that it >> doesn't any new problems here would have nothing to do with \0 being in >> the data. Perl doesn't have a notion of "binary has \0 in it", it always >> supports \0, it has a notion of "is it utf-8 or not?", so any encoding >> problems wouldn't be new. I'd think that the same would be true of >> Python, but I'm not sure. >> > > Not python safe. See [1]. > Regarding the perl language, I support Junio's point of view: it can be > re-supported in the future. Ah, I'd missed that. Anyway, if it's easy it seems you discovered that Perl deals with it correctly, so we could just have it support this. >> >> > +test_expect_success 'basic atom: refs/tags/testtag *raw' ' >> > + git cat-file commit refs/tags/testtag^{} >expected && >> > + git for-each-ref --format="%(*raw)" refs/tags/testtag >actual && >> > + sanitize_pgp <expected >expected.clean && >> > + sanitize_pgp <actual >actual.clean && >> > + echo "" >>expected.clean && >> >> Just "echo" will do, ditto for the rest. Also odd to go back and forth >> between populating expected.clean & actual.clean. >> > > Are you saying that sanitize_pgp is not needed? No that instead of: echo "" >x You can do: echo >x And also that going back and forth between populating different files is confusing, i.e. this: echo a >x echo c >y echo b >>x is better as: echo a >x echo b >>x echo c >y >> >> > +test_expect_success 'set up refs pointing to binary blob' ' >> > + printf "a\0b\0c" >blob1 && >> > + printf "a\0c\0b" >blob2 && >> > + printf "\0a\0b\0c" >blob3 && >> > + printf "abc" >blob4 && >> > + printf "\0 \0 \0 " >blob5 && >> > + printf "\0 \0a\0 " >blob6 && >> > + printf " " >blob7 && >> > + >blob8 && >> > + git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 && >> > + git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 && >> > + git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 && >> > + git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 && >> > + git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 && >> > + git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 && >> > + git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 && >> > + git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8 >> >> Hrm, xargs just to avoid: >> >> git update-ref ... $(git hash-object) ? >> > > I didn’t think about it, just for convenience. *nod*, Junio had a good suggestion. >> > +test_expect_success '%(raw) with --python must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --python >> > +' >> > + >> > +test_expect_success '%(raw) with --tcl must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --tcl >> > +' >> > + >> > +test_expect_success '%(raw) with --perl must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --perl >> > +' >> > + >> > +test_expect_success '%(raw) with --shell must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --shell >> > +' >> > + >> > +test_expect_success '%(raw) with --shell and --sort=raw must failed' ' >> > + test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell >> > +' >> >> s/must failed/must fail/, but see question above about encoding in these >> languages... > > > [1]: https://lore.kernel.org/git/CAOLTT8QR_GRm4TYk0E_eazQ+unVQODc-3L+b4V5JUN5jtZR8uA@xxxxxxxxxxxxxx/ > > Thanks for a review.