Re: [PATCH v2 2/2] KVM: selftests: Print summary stats of memory latency distribution

Colton Lewis <coltonlewis@xxxxxxxxxx> · Tue, 21 Mar 2023 19:10:10 +0000




Vipin Sharma <vipinsh@xxxxxxxxxx> writes:

On Thu, Mar 16, 2023 at 3:29 PM Colton Lewis <coltonlewis@xxxxxxxxxx>  
wrote:

Print summary stats of the memory latency distribution in nanoseconds
for dirty_log_perf_test. For every iteration, this prints the minimum,
the maximum, and the 50th, 90th, and 99th percentiles.


Can you also write how this is useful and why these 5 specific
percentiles are valuable for testers? Generally, 5 number summaries
are 0, 25, 50, 75, 100 %ile.
It might also be too much data to display with each iteration.

Generally, when measuring latency you care more about the right half of
the distribution. The left half of the distribution is assumed to be
bunched close together at some low value, and I do see in my
measurements there is not much difference between the minimum and the
50th percentile. The right half of the distribution is where things
might be going wrong. 50th, 90th, or 99th are common measures to take in
other contexts, specifically networking.

One example:
https://cloud.google.com/spanner/docs/latency-metrics#overview

Nit: minimum, 50th, 90th, 99th and maximum since this is the order you
are printing.

Ok.

@@ -428,6 +432,7 @@ int main(int argc, char *argv[])
                 .slots = 1,
                 .random_seed = 1,
                 .write_percent = 100,
+               .samples_per_vcpu = 1000,

Why is 1000 the right default choice? Maybe the default should be 0
and if anyone wants to use it then they can use the command line
option to set it?

Good point.

@@ -438,7 +443,7 @@ int main(int argc, char *argv[])

         guest_modes_append_default();

-       while ((opt = getopt(argc,  
argv, "ab:c:eghi:m:nop:r:s:v:x:w:")) != -1) {
+       while ((opt = getopt(argc,  
argv, "ab:c:eghi:m:nop:r:s:S:v:x:w:")) != -1) {

1. Please add the help section to tell about the new command line option.

Thanks for the reminder.

2. Instead of having s and S, it may be better to use a different
lower case letter, like "l" for latency. Giving this option will print
memory latency and users need to provide the number of samples they
prefer per vCPU.

Probably a good idea.

@@ -480,6 +485,9 @@ int main(int argc, char *argv[])
                 case 's':
                         p.backing_src = parse_backing_src_type(optarg);
                         break;
+               case 'S':
+                       p.samples_per_vcpu = atoi_positive("Number of  
samples/vcpu", optarg);
+                       break;

This will force users to always see latency stat when they do not want
to see it. I think this patch should be modified in a way to easily
disable this feature.
I might be wrong here and it is actually a useful metric to see with
each run. If this is true then maybe the commit should mention why it
is good for each run.

It gives an extra line of information per run. This would help people
distinguish between cases where the average is high because most
accesses are high or because a a small percentage of accesses are
*really* high.

But in general I agree it should be silent unless requested.

+void memstress_print_percentiles(struct kvm_vm *vm, int nr_vcpus)
+{
+       uint64_t n_samples = nr_vcpus * memstress_args.samples_per_vcpu;
+       uint64_t *host_latency_samples = addr_gva2hva(vm,  
memstress_args.latency_samples);
+
+       qsort(host_latency_samples, n_samples, sizeof(uint64_t),  
&memstress_qcmp);
+
+       pr_info("Latency distribution (ns) = min:%6.0lf, 50th:%6.0lf,  
90th:%6.0lf, 99th:%6.0lf, max:%6.0lf\n",
+               cycles_to_ns(vcpus[0], (double)host_latency_samples[0]),

I am not much aware of how tsc is set up and used. Will all vCPUs have
the same tsc value? Can this change if vCPU gets scheduled to
different pCPU on the host?

All vCPUs *in one VM* should have the same frequency. The alternative is
probably possible but so weird I can't imagine a reason for doing it.

+               cycles_to_ns(vcpus[0],  
(double)host_latency_samples[n_samples / 2]),
+               cycles_to_ns(vcpus[0],  
(double)host_latency_samples[n_samples * 9 / 10]),
+               cycles_to_ns(vcpus[0],  
(double)host_latency_samples[n_samples * 99 / 100]),
+               cycles_to_ns(vcpus[0],  
(double)host_latency_samples[n_samples - 1]));
+}

All of the dirty_log_perf_tests print data in seconds followed by
nanoseconds. I will recommend keeping it the same.
Also, 9 digits for nanoseconds instead of 6 as in other formats.

Consistency here is a good thing. Thanks for pointing out.