On Thu, Mar 16, 2023 at 3:29 PM Colton Lewis <coltonlewis@xxxxxxxxxx> wrote: > > Print summary stats of the memory latency distribution in nanoseconds > for dirty_log_perf_test. For every iteration, this prints the minimum, > the maximum, and the 50th, 90th, and 99th percentiles. > Can you also write how this is useful and why these 5 specific percentiles are valuable for testers? Generally, 5 number summaries are 0, 25, 50, 75, 100 %ile. It might also be too much data to display with each iteration. Nit: minimum, 50th, 90th, 99th and maximum since this is the order you are printing. > @@ -428,6 +432,7 @@ int main(int argc, char *argv[]) > .slots = 1, > .random_seed = 1, > .write_percent = 100, > + .samples_per_vcpu = 1000, Why is 1000 the right default choice? Maybe the default should be 0 and if anyone wants to use it then they can use the command line option to set it? > @@ -438,7 +443,7 @@ int main(int argc, char *argv[]) > > guest_modes_append_default(); > > - while ((opt = getopt(argc, argv, "ab:c:eghi:m:nop:r:s:v:x:w:")) != -1) { > + while ((opt = getopt(argc, argv, "ab:c:eghi:m:nop:r:s:S:v:x:w:")) != -1) { 1. Please add the help section to tell about the new command line option. 2. Instead of having s and S, it may be better to use a different lower case letter, like "l" for latency. Giving this option will print memory latency and users need to provide the number of samples they prefer per vCPU. > @@ -480,6 +485,9 @@ int main(int argc, char *argv[]) > case 's': > p.backing_src = parse_backing_src_type(optarg); > break; > + case 'S': > + p.samples_per_vcpu = atoi_positive("Number of samples/vcpu", optarg); > + break; This will force users to always see latency stat when they do not want to see it. I think this patch should be modified in a way to easily disable this feature. I might be wrong here and it is actually a useful metric to see with each run. If this is true then maybe the commit should mention why it is good for each run. > +void memstress_print_percentiles(struct kvm_vm *vm, int nr_vcpus) > +{ > + uint64_t n_samples = nr_vcpus * memstress_args.samples_per_vcpu; > + uint64_t *host_latency_samples = addr_gva2hva(vm, memstress_args.latency_samples); > + > + qsort(host_latency_samples, n_samples, sizeof(uint64_t), &memstress_qcmp); > + > + pr_info("Latency distribution (ns) = min:%6.0lf, 50th:%6.0lf, 90th:%6.0lf, 99th:%6.0lf, max:%6.0lf\n", > + cycles_to_ns(vcpus[0], (double)host_latency_samples[0]), I am not much aware of how tsc is set up and used. Will all vCPUs have the same tsc value? Can this change if vCPU gets scheduled to different pCPU on the host? > + cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples / 2]), > + cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples * 9 / 10]), > + cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples * 99 / 100]), > + cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples - 1])); > +} All of the dirty_log_perf_tests print data in seconds followed by nanoseconds. I will recommend keeping it the same. Also, 9 digits for nanoseconds instead of 6 as in other formats.