On 11/30/2021 8:49 AM, Han Xin wrote: > On Tue, Nov 30, 2021 at 1:37 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: >> $ hyperfine \ >> --prepare 'rm -rf dest.git && git init --bare dest.git' \ >> -n 'old' '~/_git/git-upstream/git -C dest.git unpack-objects <big.pack' \ >> -n 'new' '~/_git/git/git -C dest.git unpack-objects <big.pack' \ >> -n 'new (small threshold)' '~/_git/git/git -c core.bigfilethreshold=64k -C dest.git unpack-objects <big.pack' >> >> Benchmark 1: old >> Time (mean ± σ): 20.835 s ± 0.058 s [User: 14.510 s, System: 6.284 s] >> Range (min … max): 20.741 s … 20.909 s 10 runs >> >> Benchmark 2: new >> Time (mean ± σ): 26.515 s ± 0.072 s [User: 19.783 s, System: 6.696 s] >> Range (min … max): 26.419 s … 26.611 s 10 runs >> >> Benchmark 3: new (small threshold) >> Time (mean ± σ): 26.523 s ± 0.101 s [User: 19.805 s, System: 6.680 s] >> Range (min … max): 26.416 s … 26.739 s 10 runs >> >> Summary >> 'old' ran >> 1.27 ± 0.00 times faster than 'new' >> 1.27 ± 0.01 times faster than 'new (small threshold)' >> >> (Here, 'old' is testing a compiled version of the latest 'master' >> branch, while 'new' has your patches applied on top.) >> >> Notice from this example I had a pack with many small objects (mostly >> commits and trees) and I see that this change introduces significant >> overhead to this case. >> >> It would be nice to understand this overhead and fix it before taking >> this change any further. >> >> Thanks, >> -Stolee > > Can you show me the specific information of the repository you > tested, so that I can analyze it further. I used a pack-file from an internal repo. It happened to be using partial clone, so here is a repro with the git/git repository after cloning this way: $ git clone --no-checkout --filter=blob:none https://github.com/git/git (copy the large .pack from git/.git/objects/pack/ to big.pack) $ hyperfine \ --prepare 'rm -rf dest.git && git init --bare dest.git' \ -n 'old' '~/_git/git-upstream/git -C dest.git unpack-objects <big.pack' \ -n 'new' '~/_git/git/git -C dest.git unpack-objects <big.pack' \ -n 'new (small threshold)' '~/_git/git/git -c core.bigfilethreshold=64k -C dest.git unpack-objects <big.pack' Benchmark 1: old Time (mean ± σ): 82.748 s ± 0.445 s [User: 50.512 s, System: 32.049 s] Range (min … max): 82.042 s … 83.587 s 10 runs Benchmark 2: new Time (mean ± σ): 101.644 s ± 0.524 s [User: 67.470 s, System: 34.047 s] Range (min … max): 100.866 s … 102.633 s 10 runs Benchmark 3: new (small threshold) Time (mean ± σ): 101.093 s ± 0.269 s [User: 67.404 s, System: 33.559 s] Range (min … max): 100.639 s … 101.375 s 10 runs Summary 'old' ran 1.22 ± 0.01 times faster than 'new (small threshold)' 1.23 ± 0.01 times faster than 'new' I'm also able to repro this with a smaller repo (microsoft/scalar) so the tests complete much faster: $ hyperfine \ --prepare 'rm -rf dest.git && git init --bare dest.git' \ -n 'old' '~/_git/git-upstream/git -C dest.git unpack-objects <small.pack' \ -n 'new' '~/_git/git/git -C dest.git unpack-objects <small.pack' \ -n 'new (small threshold)' '~/_git/git/git -c core.bigfilethreshold=64k -C dest.git unpack-objects <small.pack' Benchmark 1: old Time (mean ± σ): 3.295 s ± 0.023 s [User: 1.063 s, System: 2.228 s] Range (min … max): 3.269 s … 3.351 s 10 runs Benchmark 2: new Time (mean ± σ): 3.592 s ± 0.105 s [User: 1.261 s, System: 2.328 s] Range (min … max): 3.378 s … 3.679 s 10 runs Benchmark 3: new (small threshold) Time (mean ± σ): 3.584 s ± 0.144 s [User: 1.241 s, System: 2.339 s] Range (min … max): 3.359 s … 3.747 s 10 runs Summary 'old' ran 1.09 ± 0.04 times faster than 'new (small threshold)' 1.09 ± 0.03 times faster than 'new' It's not the same relative overhead, but still significant. These pack-files contain (mostly) small objects, no large blobs. I know that's not the target of your efforts, but it would be good to avoid a regression here. Thanks, -Stolee