Hi ZheNing, On Thu, Aug 19, 2021 at 3:39 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote: > > Hi, Christian and Hariom, > > I want to use this patch series as the temporary final version of GSOC project: > > https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic I am still not very happy with the last patch in the series,but it can be improved later. > Due to the branch ref-filter-opt-code-logic or branch > ref-filter-opt-perf patch series > temporarily unable to reflect its optimization to git cat-file > --batch. Therefore, using > branch cat-file-reuse-ref-filter-logic is the most effective now. > > This is the final performance regression test result: > Test upstream/master this > tree > ------------------------------------------------------------------------------------ > 1006.2: cat-file --batch-check 0.06(0.06+0.00) > 0.08(0.07+0.00) +33.3% > 1006.3: cat-file --batch-check with atoms 0.06(0.04+0.01) > 0.06(0.06+0.00) +0.0% > 1006.4: cat-file --batch 0.49(0.47+0.02) > 0.48(0.47+0.01) -2.0% > 1006.5: cat-file --batch with atoms 0.48(0.44+0.03) > 0.47(0.46+0.01) -2.1% > > git cat-file --batch has a performance improvement of about 2%. > git cat-file --batch-check still has a performance gap of 33.3%. > > The performance degradation of git cat-file --batch-check is actually > not very big. > > upstream/master (225bc32a98): > > $ hyperfine --warmup=10 "~/git/bin-wrappers/git cat-file > --batch-check --batch-all-objects" > Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects > Time (mean ± σ): 596.2 ms ± 5.7 ms [User: 563.0 ms, System: 32.5 ms] > Range (min … max): 586.9 ms … 607.9 ms 10 runs > > cat-file-reuse-ref-filter-logic (709a0c5c12): > > $ hyperfine --warmup=10 "~/git/bin-wrappers/git cat-file > --batch-check --batch-all-objects" > Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects > Time (mean ± σ): 601.3 ms ± 5.8 ms [User: 566.9 ms, System: 33.9 ms] > Range (min … max): 596.7 ms … 613.3 ms 10 runs > > The execution time of git cat-file --batch-check is only a few > milliseconds away. Yeah, it looks like less than 1% overhead. Great work! > But look at the execution time changes of git cat-file --batch: > > upstream/master (225bc32a98): > > $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects > >/dev/null > /home/adl/git/bin-wrappers/git cat-file --batch --batch-all-objects > > 24.61s user 0.30s system 99% cpu 24.908 total > > cat-file-reuse-ref-filter-logic (709a0c5c12): > > $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects >/dev/null > cat-file --batch --batch-all-objects > /dev/null 25.10s user 0.30s > system 99% cpu 25.417 total > > The execution time has been reduced by nearly 0.5 seconds. It looks like it has increased by 0.5s, not been reduced. > Intuition > tells me that the performance improvement of git cat-file --batch will be > more important. > > In fact, git cat-file origin code directly adds the obtained object data > to the output buffer; But after using ref-filter logic, it needs to copy > the object data to the intermediate data (atom_value), and finally > to the output buffer. At present, we cannot easily eliminate intermediate > data, because git for-each-ref --sort has a lot of dependence on it, > but we can reduce the overhead of copying or allocating memory as > much as possible. Ok. > I had an idea that I didn't implement before: partial data delayed evaluation. > Or to be more specific, waiting until the data is about to be added to > the output > buffer, form specific output content, this may be a way to bypass the > intermediate > data. Yeah, that might be a good idea.