Re: Git in Outreachy?

Jeff King <peff@xxxxxxxx> · Sat, 4 Sep 2021 08:50:02 -0400

On Sat, Sep 04, 2021 at 03:40:41PM +0800, ZheNing Hu wrote:

> This may be a place to promote my patches: See [1][2][3].
> It can provide some extra atoms for git cat-file --batch | --batch-check,
> like %(tree), %(author), %(tagger) etc. Although some performance
> optimizations have been made, It still has small performance gap.
> 
> If the community still expects git cat-file --batch to reuse the logic
> of ref-filter,
> I expect it to get the attention of reviewers.
> 
> The solutions I can think of to further optimize performance are:
> 1. Delay the evaluation of some ref-filter intermediate data.
> 2. Let ref-filter code reentrant and can be called in multi-threaded  to take
> advantage of multi-core.

I don't think trying to thread it will help much. For expensive formats,
where we have to actually open and parse objects, in theory we could do
that in parallel. But most of our time there is spent in zlib getting
the object data, and that all needs to be done under a big lock.

For little formats (e.g., just printing "%(refname)"), we need to
serialize the output anyway. So our unit of work is so tiny, I suspect
that the threading overhead would be a net negative.

I was coincidentally looking at ref-filter last week, and it seemed to
me that a lot of the slowness is because of the over-use of malloc
(e.g., we allocate a substring for every atom_value, and then form them
into a separate buffer). If we could parse the original format into a
form that could be traversed without having to do further allocations,
just writing directly to a strbuf (or even a file handle), I think that
would be a big improvement.

I just posted the results of some of my experiments to the list:

  https://lore.kernel.org/git/YTNpQ7Od1U%2F5i0R7@xxxxxxxxxxxxxxxxxxxxxxx/

I don't think that gives any kind of useful base to build on, but it
shows what's possible by skipping past various segments of the
ref-filter code.

-Peff