Re: Test breakage with zlib-ng

René Scharfe <l.s.r@xxxxxx> · Tue, 12 Dec 2023 23:54:34 +0100

Am 12.12.23 um 21:01 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote:
>
>> Subject: [PATCH] t6300: avoid hard-coding object sizes
>>
>> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
>> hard-coded the expected object sizes.  Coincidentally the size of commit
>> and tag is the same with zlib at the default compression level.
>>
>> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
>> encoded the sizes as a single value, which coincidentally also works
>> with sha256.
>>
>> Different compression libraries like zlib-ng may arrive at different
>> values.  Get them from the file system instead of hard-coding them to
>> make switching the compression library (or changing the compression
>> level) easier.
>
> Yeah, this is definitely the right solution here. I'm surprised the
> hard-coded values didn't cause problems before now. ;)
>
> The patch looks good to me, but a few small comments:
>
>> +test_object_file_size () {
>> +	oid=$(git rev-parse "$1")
>> +	path=".git/objects/$(test_oid_to_path $oid)"
>> +	test_file_size "$path"
>> +}
>
> Here we're assuming the objects are loose. I think that's probably OK
> (and certainly the test will notice if that changes).
>
> We're covering the formatting code paths along with the underlying
> implementation that fills in object_info->disk_sizep for loose objects.
> Which I think is plenty for this particular script, which is about
> for-each-ref.
>
> It would be nice to have coverage of the packed_object_info() code path,
> though. Back when it was added in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this,
> writing:
>
>   This patch does not include any tests, as the exact numbers
>   returned are volatile and subject to zlib and packing
>   decisions. We cannot even reliably guarantee that the
>   on-disk size is smaller than the object content (though in
>   general this should be the case for non-trivial objects).
>
> I don't think it's that big a deal, but I guess we could do something
> like:
>
>   prev=
>   git show-index <$pack_idx |
>   sort -n |
>   grep -A1 $oid |
>   while read ofs oid csum
>   do
>     test -n "$prev" && echo "$((ofs - prev))"
>     prev=$ofs
>   done
>
> It feels a little redundant with what Git is doing under the hood, but
> at least is exercising the code (and we're using the idx directly, so
> we're confirming that the revindex is right).

A generic object size function based on both methods could live in the
test lib and be used for e.g. cat-file tests as well.  Getting such a
function polished and library-worthy is probably more work than I
naively imagine, however -- due to our shunning of pipes alone.

> Anyway, that is all way beyond the scope of your patch, but I wonder if
> it's worth doing on top.
>
>> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>>  test_atom head push:strip=-1 main
>>  test_atom head objecttype commit
>>  test_atom head objectsize $((131 + hexlen))
>> -test_atom head objectsize:disk $disklen
>> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>
> These test_object_file_size calls are happening outside of any
> test_expect_* block, so we'd miss failing exit codes (and also the
> helper is not &&-chained), and any stderr would leak to the output.
> That's probably OK in practice, though (if something goes wrong then the
> expected value output will be bogus and the test itself will fail).

Right.  We could also set variables during setup, though, to put
readers' minds at rest.

René