Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月15日周日 下午6:28写道: > > > I think the key to continuing to optimize is still to reduce unnecessary copies. > > Do you have data or hints about this? > Here is the result of perf diff with git cat-file --batch-check --batch-all-objects between upstream/master and cat-file-reuse-ref-filter-logic: # Event 'cycles:u' # # Baseline Delta Abs Shared Object Symbol # ........ ......... ................ .................................. # 18.08% -2.11% libz.so.1.2.11 [.] 0x0000000000008dde +2.09% libc-2.33.so [.] malloc_consolidate 21.65% -1.70% git [.] unpack_object_header_buffer 5.62% -1.57% git [.] use_pack 0.59% +1.49% libc-2.33.so [.] _int_free +1.40% git [.] get_object 13.62% -1.39% libz.so.1.2.11 [.] inflate 1.54% -0.98% libc-2.33.so [.] __strchrnul_avx2 +0.90% git [.] format_ref_array_item +0.82% git [.] populate_value 0.77% +0.74% libc-2.33.so [.] __strlen_avx2 0.37% +0.68% libc-2.33.so [.] cfree@GLIBC_2.2.5 +0.56% git [.] append_literal.isra.0 0.32% +0.53% libc-2.33.so [.] malloc 1.09% -0.48% git [.] get_size_from_delta 0.77% -0.46% git [.] oid_array_for_each_unique 2.48% -0.45% git [.] get_delta_base 0.20% +0.41% libc-2.33.so [.] _IO_fwrite +0.41% git [.] batch_object_write.constprop.0 +0.40% git [.] quote_formatting +0.39% git [.] get_ref_atom_value 0.31% +0.38% git [.] strbuf_add +0.36% libc-2.33.so [.] __libc_calloc 0.41% +0.35% [unknown] [k] 0xffffffff84c00158 0.58% -0.33% libc-2.33.so [.] unlink_chunk.constprop.0 1.04% -0.31% libc-2.33.so [.] __GI___qsort_r +0.30% git [.] 0x000000000001cd80 +0.30% git [.] append_atom 3.23% +0.30% libc-2.33.so [.] __memmove_avx_unaligned_erms malloc_consolidate(), _int_free(), malloc(), __libc_calloc(), they take up a larger proportion of time (through gprof, we can also see that the number of calls is also increasing), it shows that our memory allocation is worth optimizing. __memmove_avx_unaligned_erms() take up a larger proportion of time, it shows that we have some unnecessary copies. Note that 4602b62 ([GSOC] ref-filter: reuse finnal buffer if no stack need) is trying to alleviate this problem, and the results show that it does have a 2% optimization. So today I added two new commits to verify my guess: We need to reduce copying. See: https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic-temp-5 Test upstream/master this tree ------------------------------------------------------------------------------------ 1006.2: cat-file --batch-check 0.08(0.06+0.01) 0.08(0.06+0.01) +0.0% 1006.3: cat-file --batch-check with atoms 0.06(0.04+0.01) 0.07(0.06+0.01) +16.7% 1006.4: cat-file --batch 0.49(0.47+0.02) 0.47(0.46+0.00) -4.1% 1006.5: cat-file --batch with atoms 0.47(0.45+0.01) 0.46(0.46+0.00) -2.1% This is good news. The performance of our git cat-file --batch is better than before! > Have you looked at why there is still +16.7% in the cat-file > --batch-check with atoms case? Maybe solving this would improve things > in the other cases too. > I am continuing to think about it, I guess it is still a malloc/free/memmove problem. The execution time of git cat-file --batch-check is relatively short, and the disadvantages of using ref-filter logic will be more prominent. > > ### Additional advice > > > > During the optimization process this week, I found that I might want > > to use a `strbuf_move()` function, although I did not adopt it in my work > > (because it did not allow my work to be greatly optimized), but I think it might > > be useful in some situations: We don’t want to copy the data of strbuf, > > but just want to move its buf pointer: > > Yeah, if we are sure that it improves performance, that might be worth it. > > About your patch series > (https://lore.kernel.org/git/pull.1016.git.1628842990.gitgitgadget@xxxxxxxxx/), > I wonder if it might be possible and better to have some performance > improvements before the commit that uses the ref-filter code > ("cat-file: reuse ref-filter logic"). If you are going this way, it > might make sense to add a few performance tests, if some don't already > exist, to show the effect of these performance improvements on the > ref-filter code when it's used by other commands like for-each-ref. > > This could show that these performance improvements are worth doing > even if we didn't want to reuse the ref-filter code in cat-file. And > then perhaps these performance improvements could be merged as part of > one or more small patch series, before the patch series that reuses > the ref-filter code in cat-file. > Ah, this is estimated to take a lot of effort. I feel that it will take a long time for these optimization patches to be merged into the master. I tried to add a performance test for for-each-ref, but the effect is very insignificant, because its execution time is too short. > Best, > Christian. Thanks. -- ZheNing Hu