Re: [PATCH 0/4] test-tool: split up "read-cache" tool

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Wed, 30 Jun 2021 15:02:41 -0700

On Mon, Jun 07, 2021 at 01:58:23PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> When the sparse index topic was being discussed I suggested that the
> t/helper/read-cache.c tool was getting to the point of doing too many
> things and should be split up.
> 
> Since that series has landed on master here's that suggestion again in
> the form of patches on top of master. The 4/4 patch is a "while I was
> at it" addition of an extra perf test for index refreshing.
> 
> 1. https://lore.kernel.org/git/20210317132814.30175-6-avarab@xxxxxxxxx/
> 

Since "What's cooking" mentioned this series was starved for review, we
took a look at it today. Most of my comments are pretty general, so I'll
reply primarily to the cover letter instead.

The result after this series is that we have three forms of 'test-tool
read-cache':
 - one for unit testing, with no iteration (test-tool read-cache)
 - one for not-so-perfy iteration (test-tool read-cache-again)
 - one for perfy iteration and refresh_index benching (test-tool
   read-cache-perf)

Before I read patch 4, I said, "'again' and 'perf' have a lot of code in
common, but I guess we are trying to reduce the amount of overhead for
tight-loop performance testing, so OK." But patch 4 adds an alternative
body for the inside of the loop, which *looks* not very performant
(although is probably optimized well) by checking an unchanging bool on
every iteration of the loop.

I tend to think it would be easier to read
and understand to have two forms of 'test-tool read-cache' - one which
iterates and one which does not. Maybe the one that iterates should be
called -perf, maybe it should be called something else, whatever. And
perhaps it makes sense for the iterating one to look like so (heavily
pseudocoded, hope you can follow along with the rough sketch):

  enum iteration_mode;

  parse_options();
  if (should_do_again) {
    iteration_mode = AGAIN;
  else if (should_do_perf) {
    iteration_mode = PERF;
  else if (should_do_refresh) {
    iteration_mode = REFRESH;
  }

  while (passes--)
    switch (iteration_mode) {
    case AGAIN:
      read index;
      refresh index;
      index_name_pos;
      error reporting;
      write_file;
      discard_index;
      break;
    case PERF:
      read index;
      discard index;
      break;
    case REFRESH:
      refresh_index;
      break;
    }

This would put all our "loop lots of times for performance benchmarking"
into one place. We know that the switch statement is very performant,
especially if we manage to const-ify iteration_mode.  The cases make it
very clear that the body of the loop is being swapped out depending on
the arguments, and that entirely different behavior is happening in each
scenario.

There's also an orthogonal bit of cleanup here by moving to
parse_options(), which I am excited about in general :) but which I
think wasn't done very cleanly in this series. In patch 1, the commit
message makes no mention of the fairly significant refactor happening to
move 'test-tool read-cache' to parse_options(), and I think the mix of
cut&paste with refactor makes the patch a little muddy. What about a
series like so:

 1/3: teach test-tool read-cache to use parse_options()
 2/3: add test-tool read-cache-(perf|iterating|whatever)
 3/3: teach test-tool read-cache-perf "--refresh"

Thanks, and hopefully this is a welcome necromancy and not an annoying
one ;)

 - Emily