Re: [PATCH 05/10] git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 26 Feb 2025 05:26:04 -0800

Jeff King <peff@xxxxxxxx> writes:

> But these do not correspond when we see Z_NEED_DICT! Zlib consumes the
> bytes from the input buffer but it does not increment total_in. And so
> we hit the BUG("total_in mismatch") call.
>
> There are a few options here:
>
>   - We could ditch that BUG() check. It is making too many assumptions
>     about how zlib updates these values. But it does have value in most
>     cases as a sanity check on the values we're copying.
>
>   - We could skip the zlib_post_call() entirely when we see Z_NEED_DICT.
>     We know that it's hard error for us, so we should just send the
>     status up the stack and let the caller bail.
>
>     The downside is that if we ever did want to support dictionaries,
>     we couldn't (the git_zstream will be out of sync, since we never
>     copied its values back from the z_stream).
>
>   - We could continue to call zlib_post_call(), but skip just that BUG()
>     check if the status is Z_NEED_DICT. This keeps git_inflate() as a
>     thin wrapper around inflate(), and would let us later support
>     dictionaries for some calls if we wanted to.
>
> This patch uses the third approach. It seems like the least-surprising
> thing to keep git_inflate() a close to inflate() as possible. And while
> it makes the diff a bit larger (since we have to pass the status down to
> to the zlib_post_call() function), it's a static local function, and
> every caller by definition will have just made a zlib call (and so will
> have a status integer).

Ouch.  I am not sure if I would have made the choice you guys made
if I were writing this fix, but that is mostly because I do not
anticipate that we would ever support NEED_DICT and the other two
options seem simpler, and certainly not because I have any concrete
reason to oppose the third approach.

> +test_expect_success 'object reading handles zlib dictionary' - <<\EOT
> +	echo 'content that will be recompressed' >file &&
> +	blob=$(git hash-object -w file) &&
> +	objpath=.git/objects/$(test_oid_to_path "$blob") &&
> +
> +	# Recompress a loose object using a precomputed zlib dictionary.
> +	# This was originally done with:
> +	#
> +	#  perl -MCompress::Raw::Zlib -e '
> +	#    binmode STDIN;
> +	#    binmode STDOUT;
> +	#    my $data = do { local $/; <STDIN> };
> +	#    my $in = new Compress::Raw::Zlib::Inflate;
> +	#    my $de = new Compress::Raw::Zlib::Deflate(
> +	#      -Dictionary => "anything"
> +	#    );
> +	#    $in->inflate($data, $raw);
> +	#    $de->deflate($raw, $out);
> +	#    print $out;
> +	#  ' <obj.bak >$objpath
> +	#
> +	# but we do not want to require the perl module for all test runs (nor
> +	# carry a custom t/helper program that uses zlib features we don't
> +	# otherwise care about).
> +	mv "$objpath" obj.bak &&
> +	test_when_finished 'mv obj.bak "$objpath"' &&
> +	printf '\170\273\017\112\003\143' >$objpath &&
> +
> +	test_must_fail git cat-file blob $blob 2>err &&
> +	test_grep 'error: inflate: needs dictionary' err
> +EOT
> +

Wheee.  I guess an ugly and down-to-bit-sequence test is much better
than having no test at all ;-)

We'd want to say >"$objpath" to adhere to CodingGuidelines, but
I guess those particular versions of bash has not been seen in the
wild for quite some time and we may want to loosen the rule there.

Anyway, looking good.  Thanks.