Am 14.12.23 um 21:59 schrieb Jeff King: > On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote: > >> Add test_object_size and its helpers test_loose_object_size and >> test_packed_object_size, which allow determining the size of a Git >> object using only the low-level Git commands rev-parse and show-index. >> >> Use it in t6300 to replace the bare-bones function test_object_file_size >> as a motivating example. There it provides the expected output of the >> high-level Git command for-each-ref. > > This adds a packed-object function, but I doubt anybody actually calls > it. If we're going to do that, it's probably worth adding some tests for > "cat-file --batch-check" or similar. Yes, and I was assuming that someone else would be eager to add such tests. *ahem* > At which point I wonder if rather than having a function for a single > object, we are better off just testing the result of: > > git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)' > > against a single post-processed "show-index" invocation. Sure, we might want to optimize for bulk-processing and possibly end up only checking the size of single objects in t6300, making new library functions unnecessary. When dumping size information of multiple objects, it's probably a good idea to include "%(objectname)" as well in the format. You'd need one show-index call for each .idx file. A simple test would only have a single one; a library function might need to handle multiple packs. >> So how about this? I'm a bit nervous about all the rules about output >> descriptors and error propagation and whatnot in the test library, but >> this implementation seems simple enough and might be useful in more than >> one test. No idea how to add support for alternate object directories, >> but I doubt we'll ever need it. > > I'm not sure that we need to do anything special with output > redirection. Shouldn't these functions just send errors to stderr as > usual? If they are run inside a test_expect block, that goes to > descriptor 4 (which is either /dev/null or the original stderr, > depending on whether "-v" was used). Good point. My bad excuse is that I copied the redirection to fd 4 from test_grep. >> + git show-index <"$idx" | >> + awk -v oid="$oid" -v end="$end" ' >> + $2 == oid {start = $1} >> + {offsets[$1] = 1} >> + END { >> + if (!start || start >= end) >> + exit 1 >> + for (o in offsets) >> + if (start < o && o < end) >> + end = o >> + print end - start >> + } >> + ' && return 0 > > I was confused at first, because I didn't see any sorting happening. But > if I understand correctly, you're just looking for the smallest "end" > that comes after the start of the object we're looking for. Which I > think works. Yes, calculating the minimum offset suffices when handling a single object -- no sorting required. For bulk mode we'd better sort, of course: git show-index <"$idx" | sort -n | awk -v end="$end" ' NR > 1 {print oid, $1 - start} {start = $1; oid = $2} END {print oid, end - start} ' No idea how to make such a thing robust against malformed or truncated output from show-index, but perhaps that's not necessary, depending on how the result is used. René