Re: [PATCH 0/4] some chainlint fixes and performance improvements

Jeff King <peff@xxxxxxxx> · Thu, 30 Mar 2023 18:08:23 -0400

On Tue, Mar 28, 2023 at 05:08:15PM -0400, Jeff King wrote:

> BTW, I noticed something really funky when timing t3070 for this series.
> 
>   $ time ./t3070-wildmatch.sh
>   [a bunch of output]
>   real	0m4.750s
>   user	0m3.665s
>   sys	0m0.955s
> 
>   $ time ./t3070-wildmatch.sh >/dev/null
>   real	0m18.664s
>   user	0m9.185s
>   sys	0m9.495s
> 
> Er, what? It gets way slower when redirected to /dev/null. I can't
> figure out why.

In case anyone is curious (and I know you were all on the edge of your
seats), I figured this out. The issue is that with the "powersave" CPU
governor in place, we never ratchet up the CPU frequency. Perhaps
because no process is pegging the CPU, but we just have tons of small
processes that quickly exit (which seems like a blind spot in the
governor, but at least makes some sense).

When the output is going to the terminal, then the terminal is consuming
CPU, and the frequency scales up. So it's faster when we show the
output, even though we're doing more work, because the CPU clock is
faster. Switching to the "performance" governor makes the problem go
away.

I cared for this series, of course, because I wanted to run t3070 under
hyperfine, which behaves like the /dev/null case (unless you pass
--show-output, which mangles the screen, and is why the hyperfine output
I showed earlier was so terse). So with the performance governor in
place, here's the hyperfine output for the whole series (this is on the
5-patch v2):

  $ hyperfine -P parent 0 5 -s 'git checkout jk/chainlint-fixes~{parent}' \
      -n 't3070 on jk/chainlint-fixes~{parent}' ./t3070-wildmatch.sh

  Benchmark 1: t3070 on jk/chainlint-fixes~0
    Time (mean ± σ):      3.677 s ±  0.047 s    [User: 2.893 s, System: 0.677 s]
    Range (min … max):    3.606 s …  3.725 s    10 runs

  Benchmark 2: t3070 on jk/chainlint-fixes~1
    Time (mean ± σ):      3.720 s ±  0.013 s    [User: 2.941 s, System: 0.676 s]
    Range (min … max):    3.698 s …  3.738 s    10 runs

  Benchmark 3: t3070 on jk/chainlint-fixes~2
    Time (mean ± σ):      4.224 s ±  0.019 s    [User: 3.291 s, System: 0.850 s]
    Range (min … max):    4.191 s …  4.254 s    10 runs

  Benchmark 4: t3070 on jk/chainlint-fixes~3
    Time (mean ± σ):      4.227 s ±  0.018 s    [User: 3.293 s, System: 0.856 s]
    Range (min … max):    4.198 s …  4.252 s    10 runs

  Benchmark 5: t3070 on jk/chainlint-fixes~4
    Time (mean ± σ):      4.604 s ±  0.014 s    [User: 3.599 s, System: 0.887 s]
    Range (min … max):    4.583 s …  4.629 s    10 runs

  Benchmark 6: t3070 on jk/chainlint-fixes~5
    Time (mean ± σ):      4.603 s ±  0.010 s    [User: 3.578 s, System: 0.904 s]
    Range (min … max):    4.583 s …  4.617 s    10 runs

  Summary
    't3070 on jk/chainlint-fixes~0' ran
      1.01 ± 0.01 times faster than 't3070 on jk/chainlint-fixes~1'
      1.15 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~2'
      1.15 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~3'
      1.25 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~5'
      1.25 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~4'

Which is what we'd expect. We got about 1.25x faster, in two jumps at ~3
and ~1, which were the patches removing subshells (marking commits by
their parent number is rather confusing; I think it might be worth
making a small hyperfine wrapper that feeds the commit summary to "-n").

So no effect on the series (good), but I didn't want to leave the
mystery unsolved on the list. :)

-Peff