Re: [PATCH 2/2] for-each-ref: add --count-matches option

Jeff King <peff@xxxxxxxx> · Tue, 27 Jun 2023 03:30:07 -0400

On Mon, Jun 26, 2023 at 03:09:57PM +0000, Derrick Stolee via GitGitGadget wrote:

> +for pattern in "refs/heads/" "refs/tags/" "refs/remotes"
> +do
> +	test_perf "count $pattern: git for-each-ref | wc -l" "
> +		git for-each-ref $pattern | wc -l
> +	"
> +
> +	test_perf "count $pattern: git for-each-ref --count-match" "
> +		git for-each-ref --count-matches $pattern
> +	"
> +done

I don't think this is a very realistic perf test, because for-each-ref
is doing a bunch of work to generate its default format, only to have
"wc" throw most of it away. Doing:

  git for-each-ref --format='%(refname)' | wc -l

is much better (obviously you have to remember to do that if you care
about optimizing your command, but that's true of --count-matches, too).

Running hyperfine with three variants shows that the command above is
competitive with --count-matches, though slightly slower (hyperfine
complains about short commands and outliers because these runtimes are
so tiny in the first place; I omitted those warnings from the output
below for readability):

  Benchmark 1: ./git-for-each-ref refs/remotes/ | wc -l
    Time (mean ± σ):       6.1 ms ±   0.2 ms    [User: 3.0 ms, System: 3.6 ms]
    Range (min … max):     5.6 ms …   7.1 ms    397 runs

  Benchmark 2: ./git-for-each-ref --format="%(refname)" refs/remotes/ | wc -l
    Time (mean ± σ):       3.3 ms ±   0.2 ms    [User: 2.2 ms, System: 1.5 ms]
    Range (min … max):     3.0 ms …   4.0 ms    774 runs

  Benchmark 3: ./git-for-each-ref --count-matches refs/remotes/
    Time (mean ± σ):       2.4 ms ±   0.1 ms    [User: 1.5 ms, System: 0.9 ms]
    Range (min … max):     2.2 ms …   3.4 ms    1018 runs

  Summary
    './git-for-each-ref --count-matches refs/remotes/' ran
      1.33 ± 0.10 times faster than './git-for-each-ref --format="%(refname)" refs/remotes/ | wc -l'
      2.48 ± 0.17 times faster than './git-for-each-ref refs/remotes/ | wc -l'

I will note this is an unloaded multi-core system, which gives the piped
version a slight edge. Total CPU is probably more interesting than
wall-clock time, but all of these are so short that I think the results
should be taken with a pretty big grain of salt (I had to switch from
the "powersave" to "performance" CPU governor just to get more
consistent results).

-Peff