[kvm-unit-tests] unstable hyperv_clock test results

Dima Stepanov <dimastep@xxxxxxxxxxxxxx> · Sat, 9 Jun 2018 13:23:36 +0300

Hi,

While testing our changes with the kvm-unit-tests we spotted that sometimes
the x86/hyperv_clock.c test could FAIL the drift test, but the overall result
will be PASSED.
The issue for it is that the report_summary() routine isn't used, so we made
the following changes:

diff --git a/x86/hyperv_clock.c b/x86/hyperv_clock.c
index b72e357..3ae6413 100644
--- a/x86/hyperv_clock.c
+++ b/x86/hyperv_clock.c
@@ -144,7 +144,6 @@ static void perf_test(int ncpus)
 
 int main(int ac, char **av)
 {
-       int nerr = 0;
        int ncpus;
        struct hv_reference_tsc_page shadow;
        uint64_t tsc1, t1, tsc2, t2;
@@ -191,5 +190,5 @@ int main(int ac, char **av)
        wrmsr(HV_X64_MSR_REFERENCE_TSC, 0LL);
        report("MSR value after disabling", rdmsr(HV_X64_MSR_REFERENCE_TSC) == 0);
 
-       return nerr > 0 ? 1 : 0;
+       return report_summary();
 }

After that the overall test result is unstable, sometimes it passes:
  $ ./x86-run ./x86/hyperv_clock.flat
  ...
  SUMMARY: 3 tests
And sometimes it fails:
  $ ./x86-run ./x86/hyperv_clock.flat
  ...
  suspecting drift on CPU 0? delta = 310, acceptable [0, 143)
  FAIL: TSC reference precision test
  ...
  SUMMARY: 3 tests, 1 unexpected failures
We performed some additional testing on the latest stable 4.17 environment:
  commit 29dcea88779c856c7dc92040a0c01233263101d4
  Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
  Date:   Sun Jun 3 14:15:21 2018 -0700

      Linux 4.17
This drift test behaves very strangely, if we run this test in cycle and print only:
  1. In case of success: "delta on CPU"
  2. In case of fail: "suspecting drift on CPU" and "FAIL"
then the results are as follows:
  $ while true; do sudo ./x86-run ./x86/hyperv_clock.flat; done | grep -e "FAIL" -e "delta on CPU" -e "suspecting drift on CPU"
  delta on CPU 0 was 3...169
  delta on CPU 0 was 3...164
  delta on CPU 0 was 3...174
  ...
  delta on CPU 0 was 3...133
  delta on CPU 0 was 3...967
  ...
  suspecting drift on CPU 0? delta = 271, acceptable [0, 143)
  FAIL: TSC reference precision test
  ...
  delta on CPU 0 was 3...1047
  ...
  delta on CPU 0 was 3...132
  delta on CPU 0 was 3...230
  suspecting drift on CPU 0? delta = 231, acceptable [0, 146)
  FAIL: TSC reference precision test
  delta on CPU 0 was 3...979
  delta on CPU 0 was 3...132
  ...
and so on.

So as we see it the drift test can easily pass, if during first calibration big
delta was observed and stored as max delta. As a result, for instance test with
max delta 1047 can pass successfully, and test with drift 231 can fail. Such
behaviour looks incorrect and looks like this test should be updated or
excluded. But maybe i'm missing something. Could you comment on this?

Thanks a lot, Dima.