On Thu, Oct 07 2021, Jeff King wrote: > And there I think the whole "take the best run" strategy is hampering > us. These inaccuracies in our timings go unseen, because we don't do any > statistical analysis of the results. Whereas a tool like hyperfine (for > example) will run trials until the mean stabilizes, and then let you > know when there were trials outside of a standard deviation. > > I know we're hesitant to introduce dependencies, but I do wonder if we > could have much higher quality perf results if we accepted a dependency > on a tool like that. I'd never want that for the regular test suite, but > I'd my feelings for the perf suite are much looser. I suspect not many > people run it at all, and its main utility is showing off improvements > and looking for broad regressions. It's possible somebody would want to > track down a performance change on a specific obscure platform, but in > general I'd suspect they'd be much better off timing things manually in > such a case. > > So there. That was probably more than you wanted to hear, and further > than you want to go right now. In the near-term for the tests you're > interested in, something like the "prepare" feature I outlined above > would probably not be too hard to add, and would address your immediate > problem. I'd really like that, as you point out the statistics in t/perf now are quite bad. A tool like hyperfine is ultimately generalized (for the purposes of the test suite) as something that can run templated code with labels. If anyone cared I don't see why we couldn't ship a hyperfine-fallback.pl or whatever that accepted the same parameters, and ran our current (and worse) end-to-end statistics. If that is something you're encouraged to work on and are taking requests :) : It would be really nice if t/perf could say emit a one-off Makefile and run the tests via that, rather than the one-off nproc=1 ./run script we've got now. With the same sort of templating a "hyperfine" invocation would need (and some prep/teardown phases) it would make it easy to run perf tests in parallel with ncores, or even across N number of machines.