On Tue, Nov 30, 2021 at 1:37 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 11/21/2021 10:32 PM, Han Xin wrote: > > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx> > > > > We used to call "get_data()" in "unpack_non_delta_entry()" to read the > > entire contents of a blob object, no matter how big it is. This > > implementation may consume all the memory and cause OOM. > > > > By implementing a zstream version of input_stream interface, we can use > > a small fixed buffer for "unpack_non_delta_entry()". > > > > However, unpack non-delta objects from a stream instead of from an entrie > > buffer will have 10% performance penalty. Therefore, only unpack object > > larger than the "big_file_threshold" in zstream. See the following > > benchmarks: > > > > $ hyperfine \ > > --prepare 'rm -rf dest.git && git init --bare dest.git' \ > > 'git -C dest.git unpack-objects <binary_320M.pack' > > Benchmark 1: git -C dest.git unpack-objects <binary_320M.pack > > Time (mean ± σ): 10.029 s ± 0.270 s [User: 8.265 s, System: 1.522 s] > > Range (min … max): 9.786 s … 10.603 s 10 runs > > > > $ hyperfine \ > > --prepare 'rm -rf dest.git && git init --bare dest.git' \ > > 'git -c core.bigFileThreshold=2m -C dest.git unpack-objects <binary_320M.pack' > > Benchmark 1: git -c core.bigFileThreshold=2m -C dest.git unpack-objects <binary_320M.pack > > Time (mean ± σ): 10.859 s ± 0.774 s [User: 8.813 s, System: 1.898 s] > > Range (min … max): 9.884 s … 12.192 s 10 runs > > It seems that you want us to compare this pair of results, and > hyperfine can assist with that by including multiple benchmarks > (with labels, using '-n') as follows: > > $ hyperfine \ > --prepare 'rm -rf dest.git && git init --bare dest.git' \ > -n 'old' '~/_git/git-upstream/git -C dest.git unpack-objects <big.pack' \ > -n 'new' '~/_git/git/git -C dest.git unpack-objects <big.pack' \ > -n 'new (small threshold)' '~/_git/git/git -c core.bigfilethreshold=64k -C dest.git unpack-objects <big.pack' > > Benchmark 1: old > Time (mean ± σ): 20.835 s ± 0.058 s [User: 14.510 s, System: 6.284 s] > Range (min … max): 20.741 s … 20.909 s 10 runs > > Benchmark 2: new > Time (mean ± σ): 26.515 s ± 0.072 s [User: 19.783 s, System: 6.696 s] > Range (min … max): 26.419 s … 26.611 s 10 runs > > Benchmark 3: new (small threshold) > Time (mean ± σ): 26.523 s ± 0.101 s [User: 19.805 s, System: 6.680 s] > Range (min … max): 26.416 s … 26.739 s 10 runs > > Summary > 'old' ran > 1.27 ± 0.00 times faster than 'new' > 1.27 ± 0.01 times faster than 'new (small threshold)' > > (Here, 'old' is testing a compiled version of the latest 'master' > branch, while 'new' has your patches applied on top.) > > Notice from this example I had a pack with many small objects (mostly > commits and trees) and I see that this change introduces significant > overhead to this case. > > It would be nice to understand this overhead and fix it before taking > this change any further. > > Thanks, > -Stolee Can you show me the specific information of the repository you tested, so that I can analyze it further. I test this repository, but did not meet the problem: Unpacking objects: 100% (18345/18345), 43.15 MiB hyperfine \ --prepare 'rm -rf dest.git && git init --bare dest.git' \ -n 'old' 'git -C dest.git unpack-objects <big.pack' \ -n 'new' 'new/git -C dest.git unpack-objects <big.pack' \ -n 'new (small threshold)' 'new/git -c core.bigfilethreshold=64k -C dest.git unpack-objects <big.pack' Benchmark 1: old Time (mean ± σ): 17.403 s ± 0.880 s [User: 4.996 s, System: 11.803 s] Range (min … max): 15.911 s … 19.368 s 10 runs Benchmark 2: new Time (mean ± σ): 17.788 s ± 0.199 s [User: 5.054 s, System: 12.257 s] Range (min … max): 17.420 s … 18.195 s 10 runs Benchmark 3: new (small threshold) Time (mean ± σ): 18.433 s ± 0.711 s [User: 4.982 s, System: 12.338 s] Range (min … max): 17.518 s … 19.775 s 10 runs Summary 'old' ran 1.02 ± 0.05 times faster than 'new' 1.06 ± 0.07 times faster than 'new (small threshold)' Thanks, - Han Xin