A hard dependency on "hyperfine" for t/perf

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 08 Oct 2021 09:47:58 +0200

On Thu, Oct 07 2021, Jeff King wrote:

> And there I think the whole "take the best run" strategy is hampering
> us. These inaccuracies in our timings go unseen, because we don't do any
> statistical analysis of the results. Whereas a tool like hyperfine (for
> example) will run trials until the mean stabilizes, and then let you
> know when there were trials outside of a standard deviation.
>
> I know we're hesitant to introduce dependencies, but I do wonder if we
> could have much higher quality perf results if we accepted a dependency
> on a tool like that. I'd never want that for the regular test suite, but
> I'd my feelings for the perf suite are much looser. I suspect not many
> people run it at all, and its main utility is showing off improvements
> and looking for broad regressions. It's possible somebody would want to
> track down a performance change on a specific obscure platform, but in
> general I'd suspect they'd be much better off timing things manually in
> such a case.
>
> So there. That was probably more than you wanted to hear, and further
> than you want to go right now. In the near-term for the tests you're
> interested in, something like the "prepare" feature I outlined above
> would probably not be too hard to add, and would address your immediate
> problem.

I'd really like that, as you point out the statistics in t/perf now are
quite bad.

A tool like hyperfine is ultimately generalized (for the purposes of the
test suite) as something that can run templated code with labels. If
anyone cared I don't see why we couldn't ship a hyperfine-fallback.pl or
whatever that accepted the same parameters, and ran our current (and
worse) end-to-end statistics.

If that is something you're encouraged to work on and are taking
requests :) : It would be really nice if t/perf could say emit a one-off
Makefile and run the tests via that, rather than the one-off nproc=1
./run script we've got now.

With the same sort of templating a "hyperfine" invocation would need
(and some prep/teardown phases) it would make it easy to run perf tests
in parallel with ncores, or even across N number of machines.