Re: [PATCH] t/perf/perf-lib.sh: remove test_times.* at the end test_perf_()

Jeff King <peff@xxxxxxxx> · Thu, 7 Oct 2021 22:55:18 -0400

On Thu, Oct 07, 2021 at 01:49:15PM -0400, Jeff Hostetler wrote:

> Yeah, I don't think I want to keep switching the value of _REPEAT_COUNT
> in the body of the test.  (It did feel a little "against the spirit" of
> the framework.)  I'm in the process of redoing the test to not need
> that.

Sounds good to me. :)

> There's a problem with the perf test assumptions here and I'm curious
> if there's a better way to use the perf-lib that I'm not thinking of.
> 
> When working with big repos (in this case 100K files), the actual
> checkout takes 33 seconds, but the repetitions are fast -- since they
> just print a warning and stop.  In the 1M file case that number is ~7
> minutes for the first instance.)  With the code in min_time.perl
> silently taking the min() of the runs, it looks like the checkout was
> really fast when it wasn't.  That fact gets hidden in the summary report
> printed at the end.

Right. So your option now is basically something like:

  test_perf 'checkout' '
	git reset --hard the_original_state &&
	git checkout
  '

I.e., reset the state _and_ perform the operation you want to time,
within a single test_perf(). (Here it's a literal "git reset", but
obviously it could be anything to adjust the state back to the
baseline). But that reset operation will get lumped into your timing.
That might not matter if it's comparatively cheap, but it might throw
off all your results.

What I'd propose instead is that we ought to have:

  test_perf 'checkout'
            --prepare '
	        git reset --hard the_original_state
	    ' '
	        git checkout
	    '

Having two multi-line snippets is a bit ugly (check out that awful
middle line), but I think this could be added without breaking existing
tests (they just wouldn't have a --prepare option).

If that syntax is too horrendous, we could have:

  # this saves the snippet in a variable internally, and runs
  # it before each trial of the next test_perf(), after which
  # it is discarded
  test_perf_prepare '
          git reset --hard the_original_state
  '

  test_perf 'checkout' '
          git checkout
  '

I think that would be pretty easy to implement, and would solve the most
common form of this problem. And there's plenty of prior art; just about
every decent benchmarking system has a "do this before each trial"
mechanism. Our t/perf suite (as you probably noticed) is rather more
ad-hoc and less mature.

There are cases it doesn't help, though. For instance, in one of the
scripts we measure the time to run "git repack -adb" to generate
bitmaps. But the first run has to do more work, because we can reuse
results for subsequent ones! It would help to "rm -f
objects/pack/*.bitmap", but even that's not entirely fair, as it will be
repacking from a single pack, versus whatever state we started with.

And there I think the whole "take the best run" strategy is hampering
us. These inaccuracies in our timings go unseen, because we don't do any
statistical analysis of the results. Whereas a tool like hyperfine (for
example) will run trials until the mean stabilizes, and then let you
know when there were trials outside of a standard deviation.

I know we're hesitant to introduce dependencies, but I do wonder if we
could have much higher quality perf results if we accepted a dependency
on a tool like that. I'd never want that for the regular test suite, but
I'd my feelings for the perf suite are much looser. I suspect not many
people run it at all, and its main utility is showing off improvements
and looking for broad regressions. It's possible somebody would want to
track down a performance change on a specific obscure platform, but in
general I'd suspect they'd be much better off timing things manually in
such a case.

So there. That was probably more than you wanted to hear, and further
than you want to go right now. In the near-term for the tests you're
interested in, something like the "prepare" feature I outlined above
would probably not be too hard to add, and would address your immediate
problem.

-Peff