Re: [PATCH] t1006: add tests for %(objectsize:disk)

Jeff King <peff@xxxxxxxx> · Thu, 21 Dec 2023 16:30:34 -0500

On Thu, Dec 21, 2023 at 01:19:53PM +0100, René Scharfe wrote:

> I think we can do it even in shell, especially if...
> [...]

Yeah, your conversion looks accurate. I do wonder if it is worth golfing
further, though. If it were a process invocation per object, I'd
definitely say the efficiency gain is worth it. But dropping one process
from the whole test isn't that exciting either way.

> (sort -r), then we don't need to carry the oid forward:
> 
> 			sort -nr <idx.raw >idx.sorted &&
> 			packsz=$(test_file_size "${idx%.idx}.pack") &&
> 			end=$((packsz - rawsz)) &&
> 			awk -v end="$end" "
> 			  { print \$2, end - \$1; end = \$1 }
> 			" idx.sorted ||
> 
> And at that point it should be easy to use a shell loop instead of awk:
> 
> 			while read start oid rest
> 			do
> 				size=$((end - start)) &&
> 				end=$start &&
> 				echo "$oid $size" ||
> 				return 1
> 			done <idx.sorted

The one thing I do like is that we don't have to escape anything inside
an awk program that is forced to use double-quotes. ;)

> Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
> Having the same object in multiple places for whatever reason would not
> be a cause for reporting an error in this test, I would think.

No, for the reasons I said in the commit message: if an object exists in
multiple places the test is already potentially invalid, as Git does not
promise which version it will use. So it might work racily, or it might
work for now but be fragile. By not de-duplicating, we make sure the
test's assumption holds.

> One more thing: We can do the work of the first awk invocation in the
> already existing loop as well:
> [...]
> ... but the substitutions are a bit awkward:
> 
> 		find .git/objects/?? -type f |
> 		while read path
> 		do
> 			basename=${path##*/} &&
> 			dirname=${path%/$basename} &&
> 			oid="${dirname#.git/objects/}${basename}" &&
> 			size=$(test_file_size "$path") &&
> 			echo "$oid $size" ||
> 			return 1
> 		done &&
> 
> The avoided awk invocation might be worth the trouble, though.

Yeah, I briefly considered whether it would be possible in pure shell,
but didn't get very far before assuming it was going to be ugly. Thank
you for confirming. ;)

Again, if we were doing one awk per object, I'd try to avoid it. But
since we can cover all objects in a single pass, I think it's OK.

-Peff