Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic

ZheNing Hu <adlternative@xxxxxxxxx> · Thu, 15 Jul 2021 21:53:04 +0800

Christian Couder <christian.couder@xxxxxxxxx> 于2021年7月15日周四 下午5:45写道：
>
> On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote:
> >
> > ZheNing Hu <adlternative@xxxxxxxxx> 于2021年7月15日周四 上午12:24写道：
> > >
> > > Junio C Hamano <gitster@xxxxxxxxx> 于2021年7月13日周二 上午4:38写道：
>
> > > > I find it somewhat alarming if we are talking about "fast-path"
> > > > workaround before understanding why we are seeing slowdown in the
> > > > first place.
> > >
> > > There is no complete conclusion yet, but I try to use time and hyperfine test
> > > for these commits (t/perf/* is not accurate enough):
> > >
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |                        subject                                  |
> > > --batch-check (using hyperfine) |   --batch(using time) |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: use fast path when using default_format         |
> > >         700ms                |          25.450s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: re-implement --textconv, --filters options      |
> > >         790ms                |          29.933s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse err buf in batch_object_write()           |
> > >         770ms                |          29.153s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse ref-filter logic                          |
> > >         780ms                |          29.412s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |The third batch (upstream/master)                                |
> > >         640ms                |          26.025s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > >
> > > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> > > But what causes the loss of performance needs further analysis.
> >
> > Now I think:
> > There are three main reasons why the performance of cat-file --batch
> > deteriorates after refactor.
> >
> > 1. Too many copies are used in ref-filter and we cannot avoid these copies
> > easily because ref-filter needs these copied data to implement atoms %(if),
> > %(else), %(end)... and the --sort option. The original cat-file
> > --batch only needs
> > to output the data to the final string. Its copy times are relatively small.
>
> Is it possible to check early if any of the atoms that needs these
> copied data is specified, and if none of them is specified then to
> avoid the copies?
>

Well, The copy I'm talking about here refers to something like "v->s =
xstrdup(xxx)";
but v->s is need by --sort, so it is very difficult to remove. At the
moment I think the
only solution is the fast path mentioned by Ævar Arnfjörð Bjarmason.

> > 2. More complex data structure and parsing process are used in ref-filter.
> > This is why it can provide more and more useful atoms. Therefore, I think the
> > performance degradation that occurs here is normal.
>
> Are there way the more complex parsing could be avoided if it's not
> needed by the atoms that are actually used?

No. For example, we can only support "objectsize" before and now we can
support "objectsize:short", so we need to pay more parsing process here.
(It's necessary)

>
> > 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
> > twice in get_object() before. oid_object_info_extend() is the hot
> > path, we should
> > try to avoid calling it, So in last version of  "[GSOC] cat-file:
> > re-implement --textconv,
> > --filters options", I make the unified processing of --textconv and
> > --filter avoid calling
> > oid_object_info_extend() twice.
>
> Ok, thanks for the details and your work on this performance issue!
>
> I wonder if your patch series could be split, so that the early parts
> that add new atoms to ref-filter could be merged sooner?
>

Should this part of the work be handed over to Junio?
The implementation of %(rest) and %(raw)  may be worth merging,
they are truly "zh/ref-filter-raw-data".
The other part may be called "cat-file-reuse-ref-filter-logic".

> Best,
> Christian.

Thanks.
--
ZheNing Hu