On Tue, Mar 28, 2023 at 05:08:15PM -0400, Jeff King wrote: > BTW, I noticed something really funky when timing t3070 for this series. > > $ time ./t3070-wildmatch.sh > [a bunch of output] > real 0m4.750s > user 0m3.665s > sys 0m0.955s > > $ time ./t3070-wildmatch.sh >/dev/null > real 0m18.664s > user 0m9.185s > sys 0m9.495s > > Er, what? It gets way slower when redirected to /dev/null. I can't > figure out why. In case anyone is curious (and I know you were all on the edge of your seats), I figured this out. The issue is that with the "powersave" CPU governor in place, we never ratchet up the CPU frequency. Perhaps because no process is pegging the CPU, but we just have tons of small processes that quickly exit (which seems like a blind spot in the governor, but at least makes some sense). When the output is going to the terminal, then the terminal is consuming CPU, and the frequency scales up. So it's faster when we show the output, even though we're doing more work, because the CPU clock is faster. Switching to the "performance" governor makes the problem go away. I cared for this series, of course, because I wanted to run t3070 under hyperfine, which behaves like the /dev/null case (unless you pass --show-output, which mangles the screen, and is why the hyperfine output I showed earlier was so terse). So with the performance governor in place, here's the hyperfine output for the whole series (this is on the 5-patch v2): $ hyperfine -P parent 0 5 -s 'git checkout jk/chainlint-fixes~{parent}' \ -n 't3070 on jk/chainlint-fixes~{parent}' ./t3070-wildmatch.sh Benchmark 1: t3070 on jk/chainlint-fixes~0 Time (mean ± σ): 3.677 s ± 0.047 s [User: 2.893 s, System: 0.677 s] Range (min … max): 3.606 s … 3.725 s 10 runs Benchmark 2: t3070 on jk/chainlint-fixes~1 Time (mean ± σ): 3.720 s ± 0.013 s [User: 2.941 s, System: 0.676 s] Range (min … max): 3.698 s … 3.738 s 10 runs Benchmark 3: t3070 on jk/chainlint-fixes~2 Time (mean ± σ): 4.224 s ± 0.019 s [User: 3.291 s, System: 0.850 s] Range (min … max): 4.191 s … 4.254 s 10 runs Benchmark 4: t3070 on jk/chainlint-fixes~3 Time (mean ± σ): 4.227 s ± 0.018 s [User: 3.293 s, System: 0.856 s] Range (min … max): 4.198 s … 4.252 s 10 runs Benchmark 5: t3070 on jk/chainlint-fixes~4 Time (mean ± σ): 4.604 s ± 0.014 s [User: 3.599 s, System: 0.887 s] Range (min … max): 4.583 s … 4.629 s 10 runs Benchmark 6: t3070 on jk/chainlint-fixes~5 Time (mean ± σ): 4.603 s ± 0.010 s [User: 3.578 s, System: 0.904 s] Range (min … max): 4.583 s … 4.617 s 10 runs Summary 't3070 on jk/chainlint-fixes~0' ran 1.01 ± 0.01 times faster than 't3070 on jk/chainlint-fixes~1' 1.15 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~2' 1.15 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~3' 1.25 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~5' 1.25 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~4' Which is what we'd expect. We got about 1.25x faster, in two jumps at ~3 and ~1, which were the patches removing subshells (marking commits by their parent number is rather confusing; I think it might be worth making a small hyperfine wrapper that feeds the commit summary to "-n"). So no effect on the series (good), but I didn't want to leave the mystery unsolved on the list. :) -Peff