On Wed, Aug 10, 2022 at 05:58:28PM +0000, Colton Lewis wrote: > Linear access through all pages does not seem to replicate performance State what the patch does first, then the background/motivation. > problems with realistic dirty logging workloads. Make the test more > sophisticated through random access. Each vcpu has its own sequence of > random numbers that are refilled after every iteration. Having the > main thread fill the table for every vcpu is less efficient than > having each vcpu generate its own numbers, but this ensures threading > nondeterminism won't destroy reproducibility with a given random seed. Make it clear what this patch does specifically. e.g. "Make the test more sophisticated through random access" is a bit misleading since all this patch does is create a table of random numbers. > > Signed-off-by: Colton Lewis <coltonlewis@xxxxxxxxxx> > --- > .../selftests/kvm/dirty_log_perf_test.c | 13 ++++- > .../selftests/kvm/include/perf_test_util.h | 4 ++ > .../selftests/kvm/lib/perf_test_util.c | 47 +++++++++++++++++++ > 3 files changed, 63 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c > index f99e39a672d3..80a1cbe7fbb0 100644 > --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c > +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c > @@ -132,6 +132,7 @@ struct test_params { > bool partition_vcpu_memory_access; > enum vm_mem_backing_src_type backing_src; > int slots; > + uint32_t random_seed; > }; > > static void toggle_dirty_logging(struct kvm_vm *vm, int slots, bool enable) > @@ -243,6 +244,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) > /* Start the iterations */ > iteration = 0; > host_quit = false; > + srandom(p->random_seed); > + pr_info("Random seed: %d\n", p->random_seed); > + alloc_random_table(nr_vcpus, guest_percpu_mem_size >> vm->page_shift); > + fill_random_table(nr_vcpus, guest_percpu_mem_size >> vm->page_shift); Drive the allocate and filling of the random table in perf_test_util.c as part of VM setup, and also move random_seed to perf_test_args. This will reduce the amount of code needed in the test to use perf_test_util with random accesses. dirty_log_perf_test is the only test using random accesses right now, but I could see us wanting to use it in demand_paging_test and access_tracking_perf_test in the near future. You can still have the test refresh the random table every iteration by exporting e.g. perf_test_refresh_random_table() for use by tests. > > clock_gettime(CLOCK_MONOTONIC, &start); > for (i = 0; i < nr_vcpus; i++) > @@ -270,6 +275,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) > ts_diff.tv_sec, ts_diff.tv_nsec); > > while (iteration < p->iterations) { > + fill_random_table(nr_vcpus, guest_percpu_mem_size >> vm->page_shift); I wonder if it would be better to use the same random access pattern across iterations. One of the reasons to have multiple iterations is to see how the guest performance changes as the memory moves through different phases of dirty tracking. e.g. KVM might be splitting huge pages during the first iteration but not the second. If the access pattern is also changing across iterations that could make it harder to identify performance changes due to KVM. > /* > * Incrementing the iteration number will start the vCPUs > * dirtying memory again. > @@ -380,6 +386,7 @@ static void help(char *name) > printf(" -v: specify the number of vCPUs to run.\n"); > printf(" -o: Overlap guest memory accesses instead of partitioning\n" > " them into a separate region of memory for each vCPU.\n"); > + printf(" -r: specify the starting random seed.\n"); > backing_src_help("-s"); > printf(" -x: Split the memory region into this number of memslots.\n" > " (default: 1)\n"); > @@ -396,6 +403,7 @@ int main(int argc, char *argv[]) > .partition_vcpu_memory_access = true, > .backing_src = DEFAULT_VM_MEM_SRC, > .slots = 1, > + .random_seed = time(NULL), Perhaps the default seed should be a hard-coded value so that users running the test with default arguments get deterministic results across runs. > }; > int opt; > > @@ -406,7 +414,7 @@ int main(int argc, char *argv[]) > > guest_modes_append_default(); > > - while ((opt = getopt(argc, argv, "eghi:p:m:nb:f:v:os:x:")) != -1) { > + while ((opt = getopt(argc, argv, "eghi:p:m:nb:f:v:or:s:x:")) != -1) { > switch (opt) { > case 'e': > /* 'e' is for evil. */ > @@ -442,6 +450,9 @@ int main(int argc, char *argv[]) > case 'o': > p.partition_vcpu_memory_access = false; > break; > + case 'r': > + p.random_seed = atoi(optarg); > + break; > case 's': > p.backing_src = parse_backing_src_type(optarg); > break; > diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h > index eaa88df0555a..597875d0c3db 100644 > --- a/tools/testing/selftests/kvm/include/perf_test_util.h > +++ b/tools/testing/selftests/kvm/include/perf_test_util.h > @@ -44,6 +44,10 @@ struct perf_test_args { > }; > > extern struct perf_test_args perf_test_args; > +extern uint32_t **random_table; Adding random_table to perf_test_util.h is unnecessary in this commit (it's only used in perf_test_util.c). > + > +void alloc_random_table(uint32_t nr_vcpus, uint32_t nr_randoms); > +void fill_random_table(uint32_t nr_vcpus, uint32_t nr_randoms); Use perf_test_ prefixes for symbols visible outside of perf_test_util.c. e.g. perf_test_random_table perf_test_alloc_random_table() perf_test_fill_random_table() > > struct kvm_vm *perf_test_create_vm(enum vm_guest_mode mode, int nr_vcpus, > uint64_t vcpu_memory_bytes, int slots, > diff --git a/tools/testing/selftests/kvm/lib/perf_test_util.c b/tools/testing/selftests/kvm/lib/perf_test_util.c > index 9618b37c66f7..b04e8d2c0f37 100644 > --- a/tools/testing/selftests/kvm/lib/perf_test_util.c > +++ b/tools/testing/selftests/kvm/lib/perf_test_util.c > @@ -9,6 +9,10 @@ > #include "processor.h" > > struct perf_test_args perf_test_args; > +/* This pointer points to guest memory and must be converted with > + * addr_gva2hva to be accessed from the host. > + */ > +uint32_t **random_table; Use vm_vaddr_t for variables that contain guest virtual addresses (exception within guest_code(), of course). > > /* > * Guest virtual memory offset of the testing memory slot. > @@ -70,6 +74,49 @@ void perf_test_guest_code(uint32_t vcpu_idx) > } > } > > +void alloc_random_table(uint32_t nr_vcpus, uint32_t nr_randoms) > +{ > + struct perf_test_args *pta = &perf_test_args; > + uint32_t **host_random_table; > + > + random_table = (uint32_t **)vm_vaddr_alloc( > + pta->vm, > + nr_vcpus * sizeof(uint32_t *), > + (vm_vaddr_t)0); I notice vm_vaddr_alloc_pages() and vcpu_alloc_cpuid() use KVM_UTIL_MIN_VADDR for the min. Should we use that here too? If so, this is a good opporunity to rename vm_vaddr_alloc() to __vm_vaddr_alloc() and introduce: vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz) { return __vm_vaddr_alloc(vm, sz, KVM_UTIL_MIN_VADDR); } > + host_random_table = addr_gva2hva(pta->vm, (vm_vaddr_t)random_table); > + pr_debug("Random start addr: %p %p.\n", random_table, host_random_table); > + > + for (uint32_t i = 0; i < nr_vcpus; i++) { > + host_random_table[i] = (uint32_t *)vm_vaddr_alloc( The per-vCPU random table should go in perf_test_vcpu_args along with all the other per-vCPU information that is set up by the test and consumed by the guest code. This will reduce some of the complexity here because you won't need to allocate the top-level array of pointers. > + pta->vm, > + nr_randoms * sizeof(uint32_t), > + (vm_vaddr_t)0); > + pr_debug("Random row addr: %p %p.\n", > + host_random_table[i], > + addr_gva2hva(pta->vm, (vm_vaddr_t)host_random_table[i])); Logging the host virtual addresses of the random table would probably not be valuable. But logging the guest virtual address would probably be more useful. The guest virtual address space management it pretty ad-hoc. > + } > +} > + > +void fill_random_table(uint32_t nr_vcpus, uint32_t nr_randoms) > +{ > + struct perf_test_args *pta = &perf_test_args; > + uint32_t **host_random_table = addr_gva2hva(pta->vm, (vm_vaddr_t)random_table); > + uint32_t *host_row; > + > + pr_debug("Random start addr: %p %p.\n", random_table, host_random_table); > + > + for (uint32_t i = 0; i < nr_vcpus; i++) { > + host_row = addr_gva2hva(pta->vm, (vm_vaddr_t)host_random_table[i]); > + pr_debug("Random row addr: %p %p.\n", host_random_table[i], host_row); > + > + for (uint32_t j = 0; j < nr_randoms; j++) > + host_row[j] = random(); > + > + pr_debug("New randoms row %d: %d, %d, %d...\n", > + i, host_row[0], host_row[1], host_row[2]); > + } > +} > + > void perf_test_setup_vcpus(struct kvm_vm *vm, int nr_vcpus, > struct kvm_vcpu *vcpus[], > uint64_t vcpu_memory_bytes, > -- > 2.37.1.559.g78731f0fdb-goog >