Re: [PATCH] [GSOC] ref-filter: use single strbuf for all output

ZheNing Hu <adlternative@xxxxxxxxx> · Tue, 6 Apr 2021 17:49:47 +0800

Jeff King <peff@xxxxxxxx> 于2021年4月6日周二 上午6:17写道：
>
> On Mon, Apr 05, 2021 at 02:01:19PM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can provide two single strbuf:
> > final_buf and error_buf that get reused for each output.
> >
> > When run it 100 times:
> >
> > $ git for-each-ref
> >
> > on git.git :
> >
> > 3.19s user
> > 3.88s system
> > 35% cpu
> > 20.199 total
> >
> > to:
> >
> > 2.89s user
> > 4.00s system
> > 34% cpu
> > 19.741 total
>
> That's a bigger performance improvement than I'd expect from this. I'm
> having trouble reproducing it here (I get the same time before and
> after). I also notice that you don't seem to be CPU bound, and we spend
> most of our time on system CPU (so program startup stuff, not the loop
> you're optimizing).
>
> I think a more interesting test is timing a single invocation with a
> large number of refs. If you are planning to do a lot of work on the
> formatting code, it might be worth adding such a test into t/perf (both
> to show off results, but also to catch any regressions after we make
> things faster).
>

It makes sense. A lot of refs can be convincing. Just like the number of
objects measured in `cat-files` is large enough.

But this is the first time I use `t/perf/*` and there is a little problem.
It seem like whatever I run single script like `sh ./p0007-write-cache.sh`
or just `make` or `./run ${HOME}/git -- ./p0002-read-cache.sh` , these
tests will fail.

> >     This patch learned Jeff King's optimization measures in git
> >     cat-file(79ed0a5): using a single strbuf for all objects output Instead
> >     of allocating a large number of small strbuf for every object.
>
> I do think this is a good direction (for all the reasons laid out in
> 79ed0a5), though it wasn't actually the part I was most worried about
> for ref-filter performance. The bigger issue in ref-filter is that each
> individual atom will generally have its own allocated string, and then
> we'll format all of the atoms into the final strbuf output. In most
> cases we could avoid those intermediate copies entirely.
>

Yes! In `ref-filter` we set object info in `v->s` and then append them to
current `stack` buffer, and finally set in `final_buf`, the copy of the string
is expensive. I don’t know if the optimization should start by removing the
stack buffer?

> I do think this would be a useful optimization to have in addition,
> though. As for the patch itself, I looked over the review that Eric
> gave, and he already said everything I would have. :)
>

I think it should be optimized, It will reduce the overhead of malloc and
free, but it is not obvious enough.

Yes, there is a lot of bad code in my patch.

> -Peff

Thanks.
--
ZheNing Hu