On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote: > > ZheNing Hu <adlternative@xxxxxxxxx> 于2021年7月15日周四 上午12:24写道: > > > > Junio C Hamano <gitster@xxxxxxxxx> 于2021年7月13日周二 上午4:38写道: > > > I find it somewhat alarming if we are talking about "fast-path" > > > workaround before understanding why we are seeing slowdown in the > > > first place. > > > > There is no complete conclusion yet, but I try to use time and hyperfine test > > for these commits (t/perf/* is not accurate enough): > > > > ---------------------------------------------------------------------------------------------------------------------------- > > | subject | > > --batch-check (using hyperfine) | --batch(using time) | > > ---------------------------------------------------------------------------------------------------------------------------- > > |[GSOC] cat-file: use fast path when using default_format | > > 700ms | 25.450s | > > ---------------------------------------------------------------------------------------------------------------------------- > > |[GSOC] cat-file: re-implement --textconv, --filters options | > > 790ms | 29.933s | > > ---------------------------------------------------------------------------------------------------------------------------- > > |[GSOC] cat-file: reuse err buf in batch_object_write() | > > 770ms | 29.153s | > > ---------------------------------------------------------------------------------------------------------------------------- > > |[GSOC] cat-file: reuse ref-filter logic | > > 780ms | 29.412s | > > ---------------------------------------------------------------------------------------------------------------------------- > > |The third batch (upstream/master) | > > 640ms | 26.025s | > > ---------------------------------------------------------------------------------------------------------------------------- > > > > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic". > > But what causes the loss of performance needs further analysis. > > Now I think: > There are three main reasons why the performance of cat-file --batch > deteriorates after refactor. > > 1. Too many copies are used in ref-filter and we cannot avoid these copies > easily because ref-filter needs these copied data to implement atoms %(if), > %(else), %(end)... and the --sort option. The original cat-file > --batch only needs > to output the data to the final string. Its copy times are relatively small. Is it possible to check early if any of the atoms that needs these copied data is specified, and if none of them is specified then to avoid the copies? > 2. More complex data structure and parsing process are used in ref-filter. > This is why it can provide more and more useful atoms. Therefore, I think the > performance degradation that occurs here is normal. Are there way the more complex parsing could be avoided if it's not needed by the atoms that are actually used? > 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used > twice in get_object() before. oid_object_info_extend() is the hot > path, we should > try to avoid calling it, So in last version of "[GSOC] cat-file: > re-implement --textconv, > --filters options", I make the unified processing of --textconv and > --filter avoid calling > oid_object_info_extend() twice. Ok, thanks for the details and your work on this performance issue! I wonder if your patch series could be split, so that the early parts that add new atoms to ref-filter could be merged sooner? Best, Christian.