Re: [PATCH 1/2] ref-filter: hacky "streaming" mode

ZheNing Hu <adlternative@xxxxxxxxx> · Sun, 5 Sep 2021 16:20:02 +0800

Jeff King <peff@xxxxxxxx> 于2021年9月4日周六 下午8:41写道：
>
> The ref-filter code is very keen to collect all of the refs in an array
> before writing any output. This makes things slower than necessary when
> using the default sort order (which is already sorted by refname when we
> call for_each_ref()), and when no filtering options require it.
>
> This commit adds a mildly-ugly interface to detect this case and stream
> directly from filter_refs(). The caller just needs to call the
> "maybe_stream" function. Either way, they can call the usual sort/print
> functions; they'll just be noops if we did stream instead of collecting.
>
> Here are some timings on a fully-packed 500k-ref repo:
>
>   Benchmark #1: ./git.orig for-each-ref --format='%(objectname) %(refname)'
>     Time (mean ± σ):     340.2 ms ±   5.3 ms    [User: 300.5 ms, System: 39.6 ms]
>     Range (min … max):   332.9 ms … 347.0 ms    10 runs
>
>   Benchmark #2: ./git.stream for-each-ref --format='%(objectname) %(refname)'
>     Time (mean ± σ):     224.0 ms ±   3.4 ms    [User: 222.7 ms, System: 1.3 ms]
>     Range (min … max):   218.1 ms … 228.5 ms    13 runs
>
>   Summary
>     './git.stream for-each-ref --format='%(objectname) %(refname)'' ran
>       1.52 ± 0.03 times faster than './git.orig for-each-ref --format='%(objectname) %(refname)''
>
> So we definitely shave off some time, though we're still _much_ slower
> than a simple `wc -l <packed-refs` (which is around 10ms, though of
> course it isn't actually doing as much work).
>
> The point here is to reduce overall effort, though of course the time to
> first output is much improved in the streaming version, too (about 250ms
> versus 4ms).
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  builtin/for-each-ref.c |  7 ++++++
>  ref-filter.c           | 50 ++++++++++++++++++++++++++++++++++--------
>  ref-filter.h           |  8 +++++++
>  3 files changed, 56 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index 89cb6307d4..fe0b92443f 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -70,6 +70,13 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>         if (verify_ref_format(&format))
>                 usage_with_options(for_each_ref_usage, opts);
>
> +       /*
> +        * try streaming, but only without maxcount; in theory the ref-filter
> +        * code could learn about maxcount, but for now it is our own thing
> +        */
> +       if (!maxcount)
> +               ref_filter_maybe_stream(&filter, sorting, icase, &format);
> +

Yes, I think this maxcount is easy to support.

>         if (!sorting)
>                 sorting = ref_default_sorting();
>         ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
> diff --git a/ref-filter.c b/ref-filter.c
> index 93ce2a6ef2..17b78b1d30 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2283,15 +2283,19 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
>                         return 0;
>         }
>
> -       /*
> -        * We do not open the object yet; sort may only need refname
> -        * to do its job and the resulting list may yet to be pruned
> -        * by maxcount logic.
> -        */
> -       ref = ref_array_push(ref_cbdata->array, refname, oid);
> -       ref->commit = commit;
> -       ref->flag = flag;
> -       ref->kind = kind;
> +       if (ref_cbdata->filter->streaming_format) {
> +               pretty_print_ref(refname, oid, ref_cbdata->filter->streaming_format);

So we directly use pretty_print_ref() in streaming mode, OK.

> +       } else {
> +               /*
> +                * We do not open the object yet; sort may only need refname
> +                * to do its job and the resulting list may yet to be pruned
> +                * by maxcount logic.
> +                */
> +               ref = ref_array_push(ref_cbdata->array, refname, oid);
> +               ref->commit = commit;
> +               ref->flag = flag;
> +               ref->kind = kind;
> +       }
>
>         return 0;
>  }

Therefore, in streaming mode, there is no need to push ref to
ref_array, which can
reduce the overhead of malloc(), free(), which makes sense.

But here is a terrible fact: we did not use ref_array_sort() for sorting here.
So in fact, for_each_fullref_in() does the sorting work () for us by
default sort (%(refname)),
This may be due to the file system or some implementation of ref_iterator.
But this limit the application of this optimization when we use other
atoms to sort.

> @@ -2563,6 +2567,34 @@ void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
>         QSORT_S(array->items, array->nr, compare_refs, sorting);
>  }
>
> +void ref_filter_maybe_stream(struct ref_filter *filter,
> +                            const struct ref_sorting *sort, int icase,
> +                            struct ref_format *format)
> +{
> +       /* streaming only works with default for_each_ref sort order */
> +       if (sort || icase)
> +               return;
> +

Yes, this really can only be optimized on the default sort.

> +       /* these filters want to see all candidates up front */
> +       if (filter->reachable_from || filter->unreachable_from)
> +               return;
> +

Make Sence.

> +       /*
> +        * the %(symref) placeholder is broken with pretty_print_ref(),
> +        * which our streaming code uses. I suspect this is a sign of breakage
> +        * in other callers like verify_tag(), which should be fixed. But for
> +        * now just disable streaming.
> +        *
> +        * Note that this implies we've parsed the format already with
> +        * verify_ref_format().
> +        */
> +       if (need_symref)
> +               return;
> +

I haven’t taken a closer look at why pretty_print_ref() does not
support %(symref),
we can skip it first.

> +       /* OK to stream */
> +       filter->streaming_format = format;
> +}
> +
>  static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state)
>  {
>         struct strbuf *s = &state->stack->output;
> diff --git a/ref-filter.h b/ref-filter.h
> index c15dee8d6b..ecea1837a2 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -69,6 +69,9 @@ struct ref_filter {
>                 lines;
>         int abbrev,
>                 verbose;
> +
> +       /* if non-NULL, streaming output during filter_refs() is enabled */
> +       struct ref_format *streaming_format;
>  };
>
>  struct ref_format {
> @@ -135,6 +138,11 @@ char *get_head_description(void);
>  /*  Set up translated strings in the output. */
>  void setup_ref_filter_porcelain_msg(void);
>
> +/* enable streaming during filter_refs() if options allow it */
> +void ref_filter_maybe_stream(struct ref_filter *filter,
> +                            const struct ref_sorting *sort, int icase,
> +                            struct ref_format *format);
> +
>  /*
>   * Print a single ref, outside of any ref-filter. Note that the
>   * name must be a fully qualified refname.
> --
> 2.33.0.618.g5b11852304
>

Unfortunately, this optimization may not be helpful for git cat-file --batch,
see [1], batch_object_write() directly constructs a ref_array_item and call
format_ref_array_item() to grab data, It does not use ref_array. So it also
does not have this malloc() | free() overhead.

[1]: https://lore.kernel.org/git/9c5fddf6885875ccd3ce3f047bb938c77d9bbca2.1628842990.git.gitgitgadget@xxxxxxxxx/

--
ZheNing Hu