Am 21.12.23 um 22:30 schrieb Jeff King: > On Thu, Dec 21, 2023 at 01:19:53PM +0100, René Scharfe wrote: > >> I think we can do it even in shell, especially if... >> [...] > > Yeah, your conversion looks accurate. I do wonder if it is worth golfing > further, though. If it were a process invocation per object, I'd > definitely say the efficiency gain is worth it. But dropping one process > from the whole test isn't that exciting either way. Fair enough. > >> (sort -r), then we don't need to carry the oid forward: >> >> sort -nr <idx.raw >idx.sorted && >> packsz=$(test_file_size "${idx%.idx}.pack") && >> end=$((packsz - rawsz)) && >> awk -v end="$end" " >> { print \$2, end - \$1; end = \$1 } >> " idx.sorted || >> >> And at that point it should be easy to use a shell loop instead of awk: >> >> while read start oid rest >> do >> size=$((end - start)) && >> end=$start && >> echo "$oid $size" || >> return 1 >> done <idx.sorted > > The one thing I do like is that we don't have to escape anything inside > an awk program that is forced to use double-quotes. ;) For me it's processing the data in the "correct" order (descending, i.e. starting at the end, which we have to calculate first anyway based on the size). >> Should we deduplicate here, like cat-file does (i.e. use "sort -u")? >> Having the same object in multiple places for whatever reason would not >> be a cause for reporting an error in this test, I would think. > > No, for the reasons I said in the commit message: if an object exists in > multiple places the test is already potentially invalid, as Git does not > promise which version it will use. So it might work racily, or it might > work for now but be fragile. By not de-duplicating, we make sure the > test's assumption holds. Oh, skipped that paragraph. Still I don't see how a duplicate object would necessarily invalidate t1006. The comment for the test "cat-file --batch-all-objects shows all objects" a few lines above indicates that it's picky about the provenance of objects, but it uses a separate repository. I can't infer the same requirement for the root repo, but we already established that I can't read. Anyway, if someone finds a use for git repack without -d or git unpack-objects or whatever else causes duplicates in the root repository of t1006 then they can try to reverse your ban with concrete arguments. René