On Tue, Aug 25, 2020 at 11:11:45AM -0700, Jonathan Tan wrote: > > There may be other cases that get better, though. A 3% increase here is > > probably OK if we get something for it. But if our primary goal here is > > increasing multithread efficiency, then we should be able to show some > > benchmark that improves. :) > > Ah...good question. Cloning from > https://fuchsia.googlesource.com/third_party/vulkan-cts (mentioned in > patch 7), cd-ing to the pack dir, and running: > > git index-pack --stdin -o foo <*.pack > > I got 8m2.878s with my patches and 12m6.365s without. But I ran this on > a cloud virtual machine (what I have access to right now) so the numbers > might look different on a dedicated machine. Thanks, that's a much more interesting example. Here's what I get on my 8-core machine: 5302.9: index-pack default number of threads 167.70(546.19+12.00) 83.69(585.61+6.95) -50.1% So that's a considerable improvement. And hardly surprising given the repository structure. I used the script below to show the size of the delta families, and the vk-master ones really dominate in size and object number (the biggest is 50GB in one delta family). I also ran my PERF_EXTRA tests on them to see if it behaved differently as the threads increased. Nope: 5302.3: index-pack 0 threads 434.13(425.90+8.16) 5302.4: index-pack 1 threads 428.65(421.82+6.77) 5302.5: index-pack 2 threads 224.05(424.13+6.21) 5302.6: index-pack 4 threads 125.43(457.68+5.77) 5302.7: index-pack 8 threads 82.60(579.10+7.78) 5302.8: index-pack 16 threads 82.89(1147.82+9.66) 5302.9: index-pack default number of threads 83.91(576.92+8.52) Still maxes out at the number of physical cores (not unexpected, but that was the thing I was most curious about ;) ). I may run it on the 40-core machine, too. It's possible that with the new threading we're able to do better going past 20-threads. I doubt it, because I think it's mostly a function of Git's locking granularity, but worth checking. -Peff -- >8 -- #!/bin/sh # script to output size, count, and filenames for each delta family git rev-list --objects --all | git cat-file --buffer \ --batch-check='%(objectname) %(deltabase) %(objectsize) %(rest)' | perl -alne ' if ($F[1] =~ /[^0]/) { push @{$children{$F[1]}}, $F[0]; } else { push @bases, $F[0]; } $size{$F[0]} = $F[2]; $name{$F[0]} = $F[3]; END { sub add_to_component { my ($oid, $data) = @_; $data->{names}->{$name{$oid}}++; $data->{size} += $size{$oid}; $data->{nr}++; add_to_component($_, $data) for @{$children{$oid}}; } for my $b (@bases) { my $data = { size => 0, nr => 0, names => {} }; add_to_component($b, $data); print join(" ", $data->{size}, $data->{nr}, sort keys(%{$data->{names}}) ), "\n"; } } ' | sort -rn