Re: [PATCH 2/1] test-lib-functions: add object size functions

René Scharfe <l.s.r@xxxxxx> · Tue, 19 Dec 2023 17:42:39 +0100

Am 14.12.23 um 21:59 schrieb Jeff King:
> On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:
>
>> Add test_object_size and its helpers test_loose_object_size and
>> test_packed_object_size, which allow determining the size of a Git
>> object using only the low-level Git commands rev-parse and show-index.
>>
>> Use it in t6300 to replace the bare-bones function test_object_file_size
>> as a motivating example.  There it provides the expected output of the
>> high-level Git command for-each-ref.
>
> This adds a packed-object function, but I doubt anybody actually calls
> it. If we're going to do that, it's probably worth adding some tests for
> "cat-file --batch-check" or similar.

Yes, and I was assuming that someone else would be eager to add such
tests. *ahem*

> At which point I wonder if rather than having a function for a single
> object, we are better off just testing the result of:
>
>   git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'
>
> against a single post-processed "show-index" invocation.

Sure, we might want to optimize for bulk-processing and possibly end up
only checking the size of single objects in t6300, making new library
functions unnecessary.

When dumping size information of multiple objects, it's probably a good
idea to include "%(objectname)" as well in the format.

You'd need one show-index call for each .idx file.  A simple test would
only have a single one; a library function might need to handle multiple
packs.

>> So how about this?  I'm a bit nervous about all the rules about output
>> descriptors and error propagation and whatnot in the test library, but
>> this implementation seems simple enough and might be useful in more than
>> one test.  No idea how to add support for alternate object directories,
>> but I doubt we'll ever need it.
>
> I'm not sure that we need to do anything special with output
> redirection. Shouldn't these functions just send errors to stderr as
> usual? If they are run inside a test_expect block, that goes to
> descriptor 4 (which is either /dev/null or the original stderr,
> depending on whether "-v" was used).

Good point.  My bad excuse is that I copied the redirection to fd 4 from
test_grep.

>> +	git show-index <"$idx" |
>> +	awk -v oid="$oid" -v end="$end" '
>> +		$2 == oid {start = $1}
>> +		{offsets[$1] = 1}
>> +		END {
>> +			if (!start || start >= end)
>> +				exit 1
>> +			for (o in offsets)
>> +				if (start < o && o < end)
>> +					end = o
>> +			print end - start
>> +		}
>> +	' && return 0
>
> I was confused at first, because I didn't see any sorting happening. But
> if I understand correctly, you're just looking for the smallest "end"
> that comes after the start of the object we're looking for. Which I
> think works.

Yes, calculating the minimum offset suffices when handling a single
object -- no sorting required.  For bulk mode we'd better sort, of
course:

	git show-index <"$idx" |
	sort -n |
	awk -v end="$end" '
		NR > 1 {print oid, $1 - start}
		{start = $1; oid = $2}
		END {print oid, end - start}
	'

No idea how to make such a thing robust against malformed or truncated
output from show-index, but perhaps that's not necessary, depending on
how the result is used.

René