Re: [GSoC] Git Blog 13

ZheNing Hu <adlternative@xxxxxxxxx> · Mon, 16 Aug 2021 17:13:19 +0800

Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月15日周日 下午6:28写道：
>
> > I think the key to continuing to optimize is still to reduce unnecessary copies.
>
> Do you have data or hints about this?
>

Here is the result of perf diff with git cat-file --batch-check
--batch-all-objects
between upstream/master and cat-file-reuse-ref-filter-logic:

# Event 'cycles:u'
#
# Baseline  Delta Abs  Shared Object     Symbol
# ........  .........  ................  ..................................
#
    18.08%     -2.11%  libz.so.1.2.11    [.] 0x0000000000008dde
               +2.09%  libc-2.33.so      [.] malloc_consolidate
    21.65%     -1.70%  git               [.] unpack_object_header_buffer
     5.62%     -1.57%  git               [.] use_pack
     0.59%     +1.49%  libc-2.33.so      [.] _int_free
               +1.40%  git               [.] get_object
    13.62%     -1.39%  libz.so.1.2.11    [.] inflate
     1.54%     -0.98%  libc-2.33.so      [.] __strchrnul_avx2
               +0.90%  git               [.] format_ref_array_item
               +0.82%  git               [.] populate_value
     0.77%     +0.74%  libc-2.33.so      [.] __strlen_avx2
     0.37%     +0.68%  libc-2.33.so      [.] cfree@GLIBC_2.2.5
               +0.56%  git               [.] append_literal.isra.0
     0.32%     +0.53%  libc-2.33.so      [.] malloc
     1.09%     -0.48%  git               [.] get_size_from_delta
     0.77%     -0.46%  git               [.] oid_array_for_each_unique
     2.48%     -0.45%  git               [.] get_delta_base
     0.20%     +0.41%  libc-2.33.so      [.] _IO_fwrite
               +0.41%  git               [.] batch_object_write.constprop.0
               +0.40%  git               [.] quote_formatting
               +0.39%  git               [.] get_ref_atom_value
     0.31%     +0.38%  git               [.] strbuf_add
               +0.36%  libc-2.33.so      [.] __libc_calloc
     0.41%     +0.35%  [unknown]         [k] 0xffffffff84c00158
     0.58%     -0.33%  libc-2.33.so      [.] unlink_chunk.constprop.0
     1.04%     -0.31%  libc-2.33.so      [.] __GI___qsort_r
               +0.30%  git               [.] 0x000000000001cd80
               +0.30%  git               [.] append_atom
     3.23%     +0.30%  libc-2.33.so      [.] __memmove_avx_unaligned_erms

malloc_consolidate(), _int_free(), malloc(), __libc_calloc(), they
take up a larger
proportion of time (through gprof, we can also see that the number of
calls is also
increasing), it shows that our memory allocation is worth optimizing.

__memmove_avx_unaligned_erms() take up a larger proportion of time,
it shows that we have some unnecessary copies.

Note that 4602b62 ([GSOC] ref-filter: reuse finnal buffer if no stack
need) is trying
to alleviate this problem, and the results show that it does have a 2%
optimization.

So today I added two new commits to verify my guess: We need to reduce copying.
See: https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic-temp-5

Test                                        upstream/master   this
tree
------------------------------------------------------------------------------------
1006.2: cat-file --batch-check              0.08(0.06+0.01)
0.08(0.06+0.01) +0.0%
1006.3: cat-file --batch-check with atoms   0.06(0.04+0.01)
0.07(0.06+0.01) +16.7%
1006.4: cat-file --batch                    0.49(0.47+0.02)
0.47(0.46+0.00) -4.1%
1006.5: cat-file --batch with atoms         0.47(0.45+0.01)
0.46(0.46+0.00) -2.1%

This is good news. The performance of our git cat-file --batch is
better than before!

> Have you looked at why there is still +16.7% in the cat-file
> --batch-check with atoms case? Maybe solving this would improve things
> in the other cases too.
>

I am continuing to think about it, I guess it is still a
malloc/free/memmove problem.
The execution time of git cat-file --batch-check is relatively short,
and the disadvantages
of using ref-filter logic will be more prominent.

> > ### Additional advice
> >
> > During the optimization process this week, I found that I might want
> > to use a `strbuf_move()` function, although I did not adopt it in my work
> > (because it did not allow my work to be greatly optimized), but I think it might
> > be useful in some situations: We don’t want to copy the data of strbuf,
> > but just want to move its buf pointer:
>
> Yeah, if we are sure that it improves performance, that might be worth it.
>
> About your patch series
> (https://lore.kernel.org/git/pull.1016.git.1628842990.gitgitgadget@xxxxxxxxx/),
> I wonder if it might be possible and better to have some performance
> improvements before the commit that uses the ref-filter code
> ("cat-file: reuse ref-filter logic"). If you are going this way, it
> might make sense to add a few performance tests, if some don't already
> exist, to show the effect of these performance improvements on the
> ref-filter code when it's used by other commands like for-each-ref.
>
> This could show that these performance improvements are worth doing
> even if we didn't want to reuse the ref-filter code in cat-file. And
> then perhaps these performance improvements could be merged as part of
> one or more small patch series, before the patch series that reuses
> the ref-filter code in cat-file.
>

Ah, this is estimated to take a lot of effort. I feel that it will
take a long time
for these optimization patches to be merged into the master.

I tried to add a performance test for for-each-ref, but the effect is very
insignificant, because its execution time is too short.

> Best,
> Christian.

Thanks.
--
ZheNing Hu