Re: [kvm-unit-tests PATCH v3 10/13] x86/pmu: Update testcases to cover Intel Arch PMU Version 1

Like Xu <like.xu.linux@xxxxxxxxx> · Fri, 21 Oct 2022 15:32:35 +0800

Hi Sandipan,

On 19/9/2022 3:09 pm, Like Xu wrote:
On 8/9/2022 4:23 pm, Sandipan Das wrote:
On 9/6/2022 7:05 PM, Like Xu wrote:
On 6/9/2022 4:16 pm, Sandipan Das wrote:
Hi Like,

On 8/19/2022 4:39 PM, Like Xu wrote:
From: Like Xu <likexu@xxxxxxxxxxx>

For most unit tests, the basic framework and use cases which test
any PMU counter do not require any changes, except for two things:

- No access to registers introduced only in PMU version 2 and above;
- Expanded tolerance for testing counter overflows
    due to the loss of uniform control of the gloabl_ctrl register

Adding some pmu_version() return value checks can seamlessly support
Intel Arch PMU Version 1, while opening the door for AMD PMUs tests.

Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
---
   x86/pmu.c | 64 +++++++++++++++++++++++++++++++++++++------------------
   1 file changed, 43 insertions(+), 21 deletions(-)

[...]
@@ -327,13 +335,21 @@ static void check_counter_overflow(void)
               cnt.config &= ~EVNTSEL_INT;
           idx = event_to_global_idx(&cnt);
           __measure(&cnt, cnt.count);
-        report(cnt.count == 1, "cntr-%d", i);
+
+        report(check_irq() == (i % 2), "irq-%d", i);
+        if (pmu_version() > 1)
+            report(cnt.count == 1, "cntr-%d", i);
+        else
+            report(cnt.count < 4, "cntr-%d", i);
+
[...]

Sorry I missed this in the previous response. With an upper bound of
4, I see this test failing some times for at least one of the six
counters (with NMI watchdog disabled on the host) on a Milan (Zen 3)
system. Increasing it further does reduce the probability but I still
see failures. Do you see the same behaviour on systems with Zen 3 and
older processors?

A hundred runs on my machine did not report a failure.


Was this on a Zen 4 system?

But I'm not surprised by this, because some AMD platforms do
have hw PMU errata which requires bios or ucode fixes.

Please help find the right upper bound for all your available AMD boxes.


Even after updating the microcode, the tests failed just as often in an
overnight loop. However, upon closer inspection, the reason for failure
was different. The variance is well within the bounds now but sometimes,
is_the_count_reproducible() is true. Since this selects the original
verification criteria (cnt.count == 1), the tests fail.

What makes me most nervous is that AMD's core hardware events run
repeatedly against the same workload, and their count results are erratic.


With that in mind, should we consider having the following change?

diff --git a/x86/pmu.c b/x86/pmu.c
index bb16b3c..39979b8 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -352,7 +352,7 @@ static void check_counter_overflow(void)
                 .ctr = gp_counter_base,
                 .config = EVNTSEL_OS | EVNTSEL_USR | (*gp_events)[1].unit_sel 
/* instructions */,
         };
-       bool precise_event = is_the_count_reproducible(&cnt);
+       bool precise_event = is_intel() ? is_the_count_reproducible(&cnt) : 
false;

         __measure(&cnt, 0);
         count = cnt.count;

With this, the tests always pass. I will run another overnight loop and
report back if I see any errors.

You may check is_the_count_reproducible() in the test case:
[1]https://lore.kernel.org/kvm/20220905123946.95223-7-likexu@xxxxxxxxxxx/

On Zen 4 systems, this is always false and the overflow tests always
pass irrespective of whether PerfMonV2 is enabled for the guest or not.

- Sandipan

I could change it to:

         if (is_intel())
             report(cnt.count == 1, "cntr-%d", i);
         else
             report(cnt.count < 4, "cntr-%d", i);

On AMD (zen3/zen4) machines this seems to be the only way to ensure that the 
test cases don't fail:

		if (is_intel())
			report(cnt.count == 1, "cntr-%d", i);
		else
			report(cnt.count == 0xffffffffffff || cnt.count < 7, "cntr-%d", i);

but it means some hardware counter defects, can you further confirm that this 
hardware behaviour
is in line with your expectations ?


but this does not explain the difference, that is for the same workload:

if a retired hw event like "PMCx0C0 [Retired Instructions] (ExRetInstr)" is 
configured,
then it's expected to count "the number of instructions retired", the value is 
only relevant
for workload and it should remain the same over multiple measurements,

but there are two hardware counters, one AMD and one Intel, both are reset to an 
identical value
(like "cnt.count = 1 - count"), and when they overflow, the Intel counter can 
stay exactly at 1,
while the AMD counter cannot.

I know there are ulterior hardware micro-arch implementation differences here,
but what AMD is doing violates the semantics of "retired".

Is this behavior normal by design ?
I'm not sure what I'm missing, this behavior is reinforced in zen4 as you said.