Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?

ZheNing Hu <adlternative@xxxxxxxxx> · Sat, 21 Aug 2021 10:36:48 +0800

Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月21日周六 上午12:13写道：
>
> Hi ZheNing,
>
> On Thu, Aug 19, 2021 at 3:39 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote:
> >
> > Hi, Christian and Hariom,
> >
> > I want to use this patch series as the temporary final version of GSOC project:
> >
> > https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic
>
> I am still not very happy with the last patch in the series,but it can
> be improved later.
>

To be free to tell me what's not good about it, I can try my best to
improve it. :-)

> > Due to the branch ref-filter-opt-code-logic or branch
> > ref-filter-opt-perf patch series
> > temporarily unable to reflect its optimization to git cat-file
> > --batch. Therefore, using
> > branch cat-file-reuse-ref-filter-logic is the most effective now.
> >
> > This is the final performance regression test result:
> > Test                                        upstream/master   this
> > tree
> > ------------------------------------------------------------------------------------
> > 1006.2: cat-file --batch-check              0.06(0.06+0.00)
> > 0.08(0.07+0.00) +33.3%
> > 1006.3: cat-file --batch-check with atoms   0.06(0.04+0.01)
> > 0.06(0.06+0.00) +0.0%
> > 1006.4: cat-file --batch                    0.49(0.47+0.02)
> > 0.48(0.47+0.01) -2.0%
> > 1006.5: cat-file --batch with atoms         0.48(0.44+0.03)
> > 0.47(0.46+0.01) -2.1%
> >
> > git cat-file --batch has a performance improvement of about 2%.
> > git cat-file --batch-check still has a performance gap of 33.3%.
> >
> > The performance degradation of git cat-file --batch-check is actually
> > not very big.
> >
> > upstream/master (225bc32a98):
> >
> > $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> > --batch-check --batch-all-objects"
> > Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
> >  Time (mean ± σ):     596.2 ms ±   5.7 ms    [User: 563.0 ms, System: 32.5 ms]
> >  Range (min … max):   586.9 ms … 607.9 ms    10 runs
> >
> > cat-file-reuse-ref-filter-logic (709a0c5c12):
> >
> > $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> > --batch-check --batch-all-objects"
> > Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
> >  Time (mean ± σ):     601.3 ms ±   5.8 ms    [User: 566.9 ms, System: 33.9 ms]
> >  Range (min … max):   596.7 ms … 613.3 ms    10 runs
> >
> > The execution time of git cat-file --batch-check is only a few
> > milliseconds away.
>
> Yeah, it looks like less than 1% overhead.
>
> Great work!
>
> > But look at the execution time changes of git cat-file --batch:
> >
> > upstream/master (225bc32a98):
> >
> > $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects
> > >/dev/null
> > /home/adl/git/bin-wrappers/git cat-file --batch --batch-all-objects >
> >  24.61s user 0.30s system 99% cpu 24.908 total
> >
> > cat-file-reuse-ref-filter-logic (709a0c5c12):
> >
> > $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects >/dev/null
> > cat-file --batch --batch-all-objects > /dev/null  25.10s user 0.30s
> > system 99% cpu 25.417 total
> >
> > The execution time has been reduced by nearly 0.5 seconds.
>
> It looks like it has increased by 0.5s, not been reduced.
>

Unfortunately, you are right, it is not faster, but slower.
It seems that the 2% optimization measured by t/perf does not seem to be
so credible? I donno.

Test                                        upstream/master   this
tree
------------------------------------------------------------------------------------
1006.2: cat-file --batch-check              0.07(0.06+0.01)
0.08(0.07+0.01) +14.3%
1006.3: cat-file --batch-check with atoms   0.06(0.05+0.01)
0.07(0.05+0.01) +16.7%
1006.4: cat-file --batch                    0.49(0.46+0.03)
0.48(0.47+0.01) -2.0%
1006.5: cat-file --batch with atoms         0.48(0.45+0.03)
0.47(0.46+0.01) -2.1%

Do we need to focus on the benchmark instead of the sum of the
benchmark plus the
variance? i.e. 1006.4, benchmark are 0.46 and 0.47, From this perspective, the
performance of git cat-file --batch will be worse.

> > Intuition
> > tells me that the performance improvement of git cat-file --batch will be
> > more important.
> >
> > In fact, git cat-file origin code directly adds the obtained object data
> > to the output buffer; But after using ref-filter logic, it needs to copy
> > the object data to the intermediate data (atom_value), and finally
> > to the output buffer. At present, we cannot easily eliminate intermediate
> > data, because git for-each-ref --sort has a lot of dependence on it,
> > but we can reduce the overhead of copying or allocating memory as
> > much as possible.
>
> Ok.
>
> > I had an idea that I didn't implement before: partial data delayed evaluation.
> > Or to be more specific, waiting until the data is about to be added to
> > the output
> > buffer, form specific output content, this may be a way to bypass the
> > intermediate
> > data.
>
> Yeah, that might be a good idea.

I will try to do it.

Thanks.
--
ZheNing Hu