On Thu, Dec 21, 2023 at 01:19:53PM +0100, René Scharfe wrote: > I think we can do it even in shell, especially if... > [...] Yeah, your conversion looks accurate. I do wonder if it is worth golfing further, though. If it were a process invocation per object, I'd definitely say the efficiency gain is worth it. But dropping one process from the whole test isn't that exciting either way. > (sort -r), then we don't need to carry the oid forward: > > sort -nr <idx.raw >idx.sorted && > packsz=$(test_file_size "${idx%.idx}.pack") && > end=$((packsz - rawsz)) && > awk -v end="$end" " > { print \$2, end - \$1; end = \$1 } > " idx.sorted || > > And at that point it should be easy to use a shell loop instead of awk: > > while read start oid rest > do > size=$((end - start)) && > end=$start && > echo "$oid $size" || > return 1 > done <idx.sorted The one thing I do like is that we don't have to escape anything inside an awk program that is forced to use double-quotes. ;) > Should we deduplicate here, like cat-file does (i.e. use "sort -u")? > Having the same object in multiple places for whatever reason would not > be a cause for reporting an error in this test, I would think. No, for the reasons I said in the commit message: if an object exists in multiple places the test is already potentially invalid, as Git does not promise which version it will use. So it might work racily, or it might work for now but be fragile. By not de-duplicating, we make sure the test's assumption holds. > One more thing: We can do the work of the first awk invocation in the > already existing loop as well: > [...] > ... but the substitutions are a bit awkward: > > find .git/objects/?? -type f | > while read path > do > basename=${path##*/} && > dirname=${path%/$basename} && > oid="${dirname#.git/objects/}${basename}" && > size=$(test_file_size "$path") && > echo "$oid $size" || > return 1 > done && > > The avoided awk invocation might be worth the trouble, though. Yeah, I briefly considered whether it would be possible in pure shell, but didn't get very far before assuming it was going to be ugly. Thank you for confirming. ;) Again, if we were doing one awk per object, I'd try to avoid it. But since we can cover all objects in a single pass, I think it's OK. -Peff