Re: [PATCH 2/8] [GSOC] ref-filter: add %(raw) atom

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 17 Jun 2021 09:10:34 +0200

On Sat, Jun 12 2021, ZheNing Hu via GitGitGadget wrote:

> From: ZheNing Hu <adlternative@xxxxxxxxx>
>
> Add new formatting option `%(raw)`, which will print the raw
> object data without any changes. It will help further to migrate
> all cat-file formatting logic from cat-file to ref-filter.

Nice goal and feature to have.

> The raw data of blob, tree objects may contain '\0', but most of
> the logic in `ref-filter` depends on the output of the atom being
> text (specifically, no embedded NULs in it).
>
> E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
> add the data to the buffer. The raw data of a tree object is
> `100644 one\0...`, only the `100644 one` will be added to the buffer,
> which is incorrect.
>
> Therefore, add a new member in `struct atom_value`: `s_size`, which
> can record raw object size, it can help us add raw object data to
> the buffer or compare two buffers which contain raw object data.

Most of the functions that deal with this already use a strbuf in some
way, before we had a const char *, now there's a size_t to go along with
it, why not simply use a strbuf in the struct for the data? You'll then
get the size and \0 handling for free, and any functions to deal with
conversion can stick to the strbuf API, there seems to be a lot of back
and forth now.

> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, `--perl` because if our binary raw data is passed to a variable
> in the host language, the host language may not support arbitrary binary
> data in the variables of its string type.

Perl at least deals with that just fine, and to the extent that it
doesn't any new problems here would have nothing to do with \0 being in
the data. Perl doesn't have a notion of "binary has \0 in it", it always
supports \0, it has a notion of "is it utf-8 or not?", so any encoding
problems wouldn't be new. I'd think that the same would be true of
Python, but I'm not sure.

> +test_expect_success 'basic atom: refs/tags/testtag *raw' '
> +	git cat-file commit refs/tags/testtag^{} >expected &&
> +	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
> +	sanitize_pgp <expected >expected.clean &&
> +	sanitize_pgp <actual >actual.clean &&
> +	echo "" >>expected.clean &&

Just "echo" will do, ditto for the rest. Also odd to go back and forth
between populating expected.clean & actual.clean.

> +test_expect_success 'set up refs pointing to binary blob' '
> +	printf "a\0b\0c" >blob1 &&
> +	printf "a\0c\0b" >blob2 &&
> +	printf "\0a\0b\0c" >blob3 &&
> +	printf "abc" >blob4 &&
> +	printf "\0 \0 \0 " >blob5 &&
> +	printf "\0 \0a\0 " >blob6 &&
> +	printf "  " >blob7 &&
> +	>blob8 &&
> +	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
> +	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
> +	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
> +	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
> +	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
> +	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
> +	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
> +	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8

Hrm, xargs just to avoid:

    git update-ref ... $(git hash-object) ?

> +test_expect_success '%(raw) with --python must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --python
> +'
> +
> +test_expect_success '%(raw) with --tcl must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --tcl
> +'
> +
> +test_expect_success '%(raw) with --perl must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --perl
> +'
> +
> +test_expect_success '%(raw) with --shell must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --shell
> +'
> +
> +test_expect_success '%(raw) with --shell and --sort=raw must failed' '
> +	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
> +'

s/must failed/must fail/, but see question above about encoding in these
languages...