Re: [PATCH 2/2] for-each-ref: add --count-matches option

Jeff King <peff@xxxxxxxx> · Tue, 27 Jun 2023 15:59:00 -0400

On Tue, Jun 27, 2023 at 11:22:02AM -0700, Junio C Hamano wrote:

>    - populate_value() -> get_object() -> oid_object_info_extended() ->
>      grab_common_values() asks for ATOM_OBJECTTYPE that should be a
>      single xstrdup(), but oid_object_info_extended() needs to parse
>      the object, but .info.contentp ought to be NULL and we should
>      not be calling parse_object_buffer().

This is the atom that I expect to give most of the cost. We should
definitely be computing that just with a pack lookup (and chasing deltas
through the pack, though commits usually aren't deltas). But without
that atom, I think we would not even mmap the idx or pack file at all.

The ref-filter code is also pretty keen to malloc() little strings. When
we're measuring 5ms total runtimes and there's just not that much actual
work to do, those little things start to matter more.

> So it might be worth looking into where the time is going (didn't
> Peff or somebody do that a few years ago, though?) when using the
> default format and optimize or special case the codepath,[...]

Running:

 perf record -g git for-each-ref --format='%(refname) %(objecttype)' >/dev/null
 perf script report flamegraph

in linux.git shows that we spend a lot of time faulting in .idx pages.

A lot of the time is attributed to ref_array_sort(), but I think that's
a red herring. It lazy-loads the atom values, so that's where most of
the work is happening (though I would have thought it would just do the
refname atom, not all of them; but I guess it doesn't really matter
since we'll eventually need them to print anyway). Commenting out the
sort call just shows the same time spent in format_ref_array_item().

The patches you're thinking of are probably:

  https://lore.kernel.org/git/YTNpQ7Od1U%2F5i0R7@xxxxxxxxxxxxxxxxxxxxxxx/

which are mostly about micro-optimizing out those mallocs. They do still
help a little in the '%(refname)' case, but not nearly as much as
dropping '%(objecttype)' does.

(Using --format=x in theory drops out some mallocs, but I wasn't able to
measure any actual speedup).

> [...]but the
> responses I have seen from two of you makes me suspect that the new
> option is not the best general approach.

Yeah, my feeling is that with the reduced format, eliminating the
pipe/wc overhead is a pretty small improvement.

-Peff