Am 21.12.23 um 10:47 schrieb Jeff King: > On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote: > >>> This adds a packed-object function, but I doubt anybody actually calls >>> it. If we're going to do that, it's probably worth adding some tests for >>> "cat-file --batch-check" or similar. >> >> Yes, and I was assuming that someone else would be eager to add such >> tests. *ahem* > > :P OK, here it is. This can be its own topic, or go on top of the > rs/t6300-compressed-size-fix branch. Great, thank you! > -- >8 -- > Subject: [PATCH] t1006: add tests for %(objectsize:disk) > > Back when we added this placeholder in a4ac106178 (cat-file: add > %(objectsize:disk) format atom, 2013-07-10), there were no tests, > claiming "[...]the exact numbers returned are volatile and subject to > zlib and packing decisions". > > But we can use a little shell hackery to get the expected numbers > ourselves. To a certain degree this is just re-implementing what Git is > doing under the hood, but it is still worth doing. It makes sure we > exercise the %(objectsize:disk) code at all, and having the two > implementations agree gives us more confidence. > > Note that our shell code assumes that no object appears twice (either in > two packs, or as both loose and packed), as then the results really are > undefined. That's OK for our purposes, and the test will notice if that > assumption is violated (the shell version would produce duplicate lines > that Git's output does not have). > > Helped-by: René Scharfe <l.s.r@xxxxxx> > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > I stole a bit of your awk. You can tell because I'd have written it in > perl. ;) I think we can do it even in shell, especially if... > > t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh > index 271c5e4fd3..21915be308 100755 > --- a/t/t1006-cat-file.sh > +++ b/t/t1006-cat-file.sh > @@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor > cmp expect actual > ' > > +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' ' > + # our state has both loose and packed objects, > + # so find both for our expected output > + { > + find .git/objects/?? -type f | > + awk -F/ "{ print \$0, \$3\$4 }" | > + while read path oid > + do > + size=$(test_file_size "$path") && > + echo "$oid $size" || > + return 1 > + done && > + rawsz=$(test_oid rawsz) && > + find .git/objects/pack -name "*.idx" | > + while read idx > + do > + git show-index <"$idx" >idx.raw && > + sort -n <idx.raw >idx.sorted && > + packsz=$(test_file_size "${idx%.idx}.pack") && > + end=$((packsz - rawsz)) && > + awk -v end="$end" " > + NR > 1 { print oid, \$1 - start } > + { start = \$1; oid = \$2 } > + END { print oid, end - start } > + " idx.sorted || ... we stop slicing the data against the grain. Let's reverse the order (sort -r), then we don't need to carry the oid forward: sort -nr <idx.raw >idx.sorted && packsz=$(test_file_size "${idx%.idx}.pack") && end=$((packsz - rawsz)) && awk -v end="$end" " { print \$2, end - \$1; end = \$1 } " idx.sorted || And at that point it should be easy to use a shell loop instead of awk: while read start oid rest do size=$((end - start)) && end=$start && echo "$oid $size" || return 1 done <idx.sorted > + return 1 > + done > + } >expect.raw && > + sort <expect.raw >expect && The reversal above becomes irrelevant with that line, so the result in expect stays the same. Should we deduplicate here, like cat-file does (i.e. use "sort -u")? Having the same object in multiple places for whatever reason would not be a cause for reporting an error in this test, I would think. > + git cat-file --batch-all-objects \ > + --batch-check="%(objectname) %(objectsize:disk)" >actual && > + test_cmp expect actual > +' > + > test_expect_success 'set up replacement object' ' > orig=$(git rev-parse HEAD) && > git cat-file commit $orig >orig && One more thing: We can do the work of the first awk invocation in the already existing loop as well: > +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' ' > + # our state has both loose and packed objects, > + # so find both for our expected output > + { > + find .git/objects/?? -type f | > + awk -F/ "{ print \$0, \$3\$4 }" | > + while read path oid > + do > + size=$(test_file_size "$path") && > + echo "$oid $size" || > + return 1 > + done && ... but the substitutions are a bit awkward: find .git/objects/?? -type f | while read path do basename=${path##*/} && dirname=${path%/$basename} && oid="${dirname#.git/objects/}${basename}" && size=$(test_file_size "$path") && echo "$oid $size" || return 1 done && The avoided awk invocation might be worth the trouble, though. René