Jeff King <peff@xxxxxxxx> 于2021年4月6日周二 上午6:17写道: > > On Mon, Apr 05, 2021 at 02:01:19PM +0000, ZheNing Hu via GitGitGadget wrote: > > > When we use `git for-each-ref`, every ref will call > > `show_ref_array_item()` and allocate its own final strbuf > > and error strbuf. Instead, we can provide two single strbuf: > > final_buf and error_buf that get reused for each output. > > > > When run it 100 times: > > > > $ git for-each-ref > > > > on git.git : > > > > 3.19s user > > 3.88s system > > 35% cpu > > 20.199 total > > > > to: > > > > 2.89s user > > 4.00s system > > 34% cpu > > 19.741 total > > That's a bigger performance improvement than I'd expect from this. I'm > having trouble reproducing it here (I get the same time before and > after). I also notice that you don't seem to be CPU bound, and we spend > most of our time on system CPU (so program startup stuff, not the loop > you're optimizing). > > I think a more interesting test is timing a single invocation with a > large number of refs. If you are planning to do a lot of work on the > formatting code, it might be worth adding such a test into t/perf (both > to show off results, but also to catch any regressions after we make > things faster). > It makes sense. A lot of refs can be convincing. Just like the number of objects measured in `cat-files` is large enough. But this is the first time I use `t/perf/*` and there is a little problem. It seem like whatever I run single script like `sh ./p0007-write-cache.sh` or just `make` or `./run ${HOME}/git -- ./p0002-read-cache.sh` , these tests will fail. > > This patch learned Jeff King's optimization measures in git > > cat-file(79ed0a5): using a single strbuf for all objects output Instead > > of allocating a large number of small strbuf for every object. > > I do think this is a good direction (for all the reasons laid out in > 79ed0a5), though it wasn't actually the part I was most worried about > for ref-filter performance. The bigger issue in ref-filter is that each > individual atom will generally have its own allocated string, and then > we'll format all of the atoms into the final strbuf output. In most > cases we could avoid those intermediate copies entirely. > Yes! In `ref-filter` we set object info in `v->s` and then append them to current `stack` buffer, and finally set in `final_buf`, the copy of the string is expensive. I don’t know if the optimization should start by removing the stack buffer? > I do think this would be a useful optimization to have in addition, > though. As for the patch itself, I looked over the review that Eric > gave, and he already said everything I would have. :) > I think it should be optimized, It will reduce the overhead of malloc and free, but it is not obvious enough. Yes, there is a lot of bad code in my patch. > -Peff Thanks. -- ZheNing Hu