Re: [PATCH v5 1/3] KVM: selftests: implement random number generation for guest code

David Matlack <dmatlack@xxxxxxxxxx> · Fri, 9 Sep 2022 10:08:32 -0700

On Fri, Sep 09, 2022 at 12:42:58PM +0000, Colton Lewis wrote:
> Implement random number generation for guest code to randomize parts
> of the test, making it less predictable and a more accurate reflection
> of reality.
> 
> Create a -r argument to specify a random seed. If no argument is
> provided, the seed defaults to the current Unix timestamp. The random
> seed is set with perf_test_set_random_seed() and must be set before
> guest_code runs to apply.
> 
> The random number generator chosen is the Park-Miller Linear
> Congruential Generator, a fancy name for a basic and well-understood
> random number generator entirely sufficient for this purpose. Each
> vCPU calculates its own seed by adding its index to the seed provided.

Great commit message!

> 
> Signed-off-by: Colton Lewis <coltonlewis@xxxxxxxxxx>
> ---
>  tools/testing/selftests/kvm/dirty_log_perf_test.c    | 12 ++++++++++--
>  tools/testing/selftests/kvm/include/perf_test_util.h |  2 ++
>  tools/testing/selftests/kvm/include/test_util.h      |  2 ++
>  tools/testing/selftests/kvm/lib/perf_test_util.c     | 11 ++++++++++-
>  tools/testing/selftests/kvm/lib/test_util.c          |  9 +++++++++
>  5 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> index d60a34cdfaee..2f91acd94130 100644
> --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
> @@ -126,6 +126,7 @@ struct test_params {
>  	bool partition_vcpu_memory_access;
>  	enum vm_mem_backing_src_type backing_src;
>  	int slots;
> +	uint32_t random_seed;
>  };
>  
>  static void toggle_dirty_logging(struct kvm_vm *vm, int slots, bool enable)
> @@ -220,6 +221,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>  				 p->slots, p->backing_src,
>  				 p->partition_vcpu_memory_access);
>  
> +	pr_info("Random seed: %u\n", p->random_seed);
> +	perf_test_set_random_seed(vm, p->random_seed);
>  	perf_test_set_wr_fract(vm, p->wr_fract);
>  
>  	guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm);
> @@ -337,7 +340,7 @@ static void help(char *name)
>  {
>  	puts("");
>  	printf("usage: %s [-h] [-i iterations] [-p offset] [-g] "
> -	       "[-m mode] [-n] [-b vcpu bytes] [-v vcpus] [-o] [-s mem type]"
> +	       "[-m mode] [-n] [-b vcpu bytes] [-v vcpus] [-o] [-r random seed ] [-s mem type]"
>  	       "[-x memslots]\n", name);
>  	puts("");
>  	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
> @@ -362,6 +365,7 @@ static void help(char *name)
>  	printf(" -v: specify the number of vCPUs to run.\n");
>  	printf(" -o: Overlap guest memory accesses instead of partitioning\n"
>  	       "     them into a separate region of memory for each vCPU.\n");
> +	printf(" -r: specify the starting random seed.\n");
>  	backing_src_help("-s");
>  	printf(" -x: Split the memory region into this number of memslots.\n"
>  	       "     (default: 1)\n");
> @@ -378,6 +382,7 @@ int main(int argc, char *argv[])
>  		.partition_vcpu_memory_access = true,
>  		.backing_src = DEFAULT_VM_MEM_SRC,
>  		.slots = 1,
> +		.random_seed = time(NULL),

It's a bad code smell that the random seed gets default initialized to
time(NULL) twice (here and in perf_test_create_vm()).

I also still think it would be better if the default random seed was
consistent across runs. Most use-cases of dirty_log_perf_test is for A/B
testing, so consistency is key. For example, running dirty_log_perf_test
at every commit to find regressions, or running dirty_log_perf_test to
study the performance effects of some change. In other words, I think
most use-cases will want a consistent seed across runs, so the default
behavior should match that. Otherwise I forsee myself (and automated
tools) having to pass in -r to every test runs to get consistent,
comparable, behavior.

What do you think about killing 2 birds with one stone here and make the
default random_seed 0. That requires no initialization and ensures
consistent random behavior across runs.

And then optionally... I would even recommend dropping the -r parameter
until someone wants to run dirty_log_perf_test with different seeds.
That would simplify the code even more. I have a feeling there won't be
much interest in different seeds since, at the end of the day, it will
always be the same rough distribution of accesses. More interesting than
different seeds will be adding support for different types of access
patterns.