Re: [PATCH v2 2/2] KVM: selftests: Print summary stats of memory latency distribution

Vipin Sharma <vipinsh@xxxxxxxxxx> · Fri, 17 Mar 2023 11:16:11 -0700

On Thu, Mar 16, 2023 at 3:29 PM Colton Lewis <coltonlewis@xxxxxxxxxx> wrote:
>
> Print summary stats of the memory latency distribution in nanoseconds
> for dirty_log_perf_test. For every iteration, this prints the minimum,
> the maximum, and the 50th, 90th, and 99th percentiles.
>

Can you also write how this is useful and why these 5 specific
percentiles are valuable for testers? Generally, 5 number summaries
are 0, 25, 50, 75, 100 %ile.
It might also be too much data to display with each iteration.

Nit: minimum, 50th, 90th, 99th and maximum since this is the order you
are printing.

> @@ -428,6 +432,7 @@ int main(int argc, char *argv[])
>                 .slots = 1,
>                 .random_seed = 1,
>                 .write_percent = 100,
> +               .samples_per_vcpu = 1000,

Why is 1000 the right default choice? Maybe the default should be 0
and if anyone wants to use it then they can use the command line
option to set it?

> @@ -438,7 +443,7 @@ int main(int argc, char *argv[])
>
>         guest_modes_append_default();
>
> -       while ((opt = getopt(argc, argv, "ab:c:eghi:m:nop:r:s:v:x:w:")) != -1) {
> +       while ((opt = getopt(argc, argv, "ab:c:eghi:m:nop:r:s:S:v:x:w:")) != -1) {

1. Please add the help section to tell about the new command line option.
2. Instead of having s and S, it may be better to use a different
lower case letter, like "l" for latency. Giving this option will print
memory latency and users need to provide the number of samples they
prefer per vCPU.

> @@ -480,6 +485,9 @@ int main(int argc, char *argv[])
>                 case 's':
>                         p.backing_src = parse_backing_src_type(optarg);
>                         break;
> +               case 'S':
> +                       p.samples_per_vcpu = atoi_positive("Number of samples/vcpu", optarg);
> +                       break;

This will force users to always see latency stat when they do not want
to see it. I think this patch should be modified in a way to easily
disable this feature.
I might be wrong here and it is actually a useful metric to see with
each run. If this is true then maybe the commit should mention why it
is good for each run.

> +void memstress_print_percentiles(struct kvm_vm *vm, int nr_vcpus)
> +{
> +       uint64_t n_samples = nr_vcpus * memstress_args.samples_per_vcpu;
> +       uint64_t *host_latency_samples = addr_gva2hva(vm, memstress_args.latency_samples);
> +
> +       qsort(host_latency_samples, n_samples, sizeof(uint64_t), &memstress_qcmp);
> +
> +       pr_info("Latency distribution (ns) = min:%6.0lf, 50th:%6.0lf, 90th:%6.0lf, 99th:%6.0lf, max:%6.0lf\n",
> +               cycles_to_ns(vcpus[0], (double)host_latency_samples[0]),

I am not much aware of how tsc is set up and used. Will all vCPUs have
the same tsc value? Can this change if vCPU gets scheduled to
different pCPU on the host?

> +               cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples / 2]),
> +               cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples * 9 / 10]),
> +               cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples * 99 / 100]),
> +               cycles_to_ns(vcpus[0], (double)host_latency_samples[n_samples - 1]));
> +}

All of the dirty_log_perf_tests print data in seconds followed by
nanoseconds. I will recommend keeping it the same.
Also, 9 digits for nanoseconds instead of 6 as in other formats.