Hi Like, On 10/21/2022 1:02 PM, Like Xu wrote: > Hi Sandipan, > > On 19/9/2022 3:09 pm, Like Xu wrote: >> On 8/9/2022 4:23 pm, Sandipan Das wrote: >>> On 9/6/2022 7:05 PM, Like Xu wrote: >>>> On 6/9/2022 4:16 pm, Sandipan Das wrote: >>>>> Hi Like, >>>>> >>>>> On 8/19/2022 4:39 PM, Like Xu wrote: >>>>>> From: Like Xu <likexu@xxxxxxxxxxx> >>>>>> >>>>>> For most unit tests, the basic framework and use cases which test >>>>>> any PMU counter do not require any changes, except for two things: >>>>>> >>>>>> - No access to registers introduced only in PMU version 2 and above; >>>>>> - Expanded tolerance for testing counter overflows >>>>>> due to the loss of uniform control of the gloabl_ctrl register >>>>>> >>>>>> Adding some pmu_version() return value checks can seamlessly support >>>>>> Intel Arch PMU Version 1, while opening the door for AMD PMUs tests. >>>>>> >>>>>> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx> >>>>>> --- >>>>>> x86/pmu.c | 64 +++++++++++++++++++++++++++++++++++++------------------ >>>>>> 1 file changed, 43 insertions(+), 21 deletions(-) >>>>>> >>>>>> [...] >>>>>> @@ -327,13 +335,21 @@ static void check_counter_overflow(void) >>>>>> cnt.config &= ~EVNTSEL_INT; >>>>>> idx = event_to_global_idx(&cnt); >>>>>> __measure(&cnt, cnt.count); >>>>>> - report(cnt.count == 1, "cntr-%d", i); >>>>>> + >>>>>> + report(check_irq() == (i % 2), "irq-%d", i); >>>>>> + if (pmu_version() > 1) >>>>>> + report(cnt.count == 1, "cntr-%d", i); >>>>>> + else >>>>>> + report(cnt.count < 4, "cntr-%d", i); >>>>>> + >>>>>> [...] >>>>> >>>>> Sorry I missed this in the previous response. With an upper bound of >>>>> 4, I see this test failing some times for at least one of the six >>>>> counters (with NMI watchdog disabled on the host) on a Milan (Zen 3) >>>>> system. Increasing it further does reduce the probability but I still >>>>> see failures. Do you see the same behaviour on systems with Zen 3 and >>>>> older processors? >>>> >>>> A hundred runs on my machine did not report a failure. >>>> >>> >>> Was this on a Zen 4 system? >>> >>>> But I'm not surprised by this, because some AMD platforms do >>>> have hw PMU errata which requires bios or ucode fixes. >>>> >>>> Please help find the right upper bound for all your available AMD boxes. >>>> >>> >>> Even after updating the microcode, the tests failed just as often in an >>> overnight loop. However, upon closer inspection, the reason for failure >>> was different. The variance is well within the bounds now but sometimes, >>> is_the_count_reproducible() is true. Since this selects the original >>> verification criteria (cnt.count == 1), the tests fail. >>> >>>> What makes me most nervous is that AMD's core hardware events run >>>> repeatedly against the same workload, and their count results are erratic. >>>> >>> >>> With that in mind, should we consider having the following change? >>> >>> diff --git a/x86/pmu.c b/x86/pmu.c >>> index bb16b3c..39979b8 100644 >>> --- a/x86/pmu.c >>> +++ b/x86/pmu.c >>> @@ -352,7 +352,7 @@ static void check_counter_overflow(void) >>> .ctr = gp_counter_base, >>> .config = EVNTSEL_OS | EVNTSEL_USR | (*gp_events)[1].unit_sel /* instructions */, >>> }; >>> - bool precise_event = is_the_count_reproducible(&cnt); >>> + bool precise_event = is_intel() ? is_the_count_reproducible(&cnt) : false; >>> >>> __measure(&cnt, 0); >>> count = cnt.count; >>> >>> With this, the tests always pass. I will run another overnight loop and >>> report back if I see any errors. >>> >>>> You may check is_the_count_reproducible() in the test case: >>>> [1]https://lore.kernel.org/kvm/20220905123946.95223-7-likexu@xxxxxxxxxxx/>>>> >>> On Zen 4 systems, this is always false and the overflow tests always >>> pass irrespective of whether PerfMonV2 is enabled for the guest or not. >>> >>> - Sandipan >> >> I could change it to: >> >> if (is_intel()) >> report(cnt.count == 1, "cntr-%d", i); >> else >> report(cnt.count < 4, "cntr-%d", i); > > On AMD (zen3/zen4) machines this seems to be the only way to ensure that the test cases don't fail: > > if (is_intel()) > report(cnt.count == 1, "cntr-%d", i); > else > report(cnt.count == 0xffffffffffff || cnt.count < 7, "cntr-%d", i); > > but it means some hardware counter defects, can you further confirm that this hardware behaviour > is in line with your expectations ? > I am yet to investigate as to why there would a variance in count but with this updated test condition, I can confirm that the tests pass on all my systems. - Sandipan >> >> but this does not explain the difference, that is for the same workload: >> >> if a retired hw event like "PMCx0C0 [Retired Instructions] (ExRetInstr)" is configured, >> then it's expected to count "the number of instructions retired", the value is only relevant >> for workload and it should remain the same over multiple measurements, >> >> but there are two hardware counters, one AMD and one Intel, both are reset to an identical value >> (like "cnt.count = 1 - count"), and when they overflow, the Intel counter can stay exactly at 1, >> while the AMD counter cannot. >> >> I know there are ulterior hardware micro-arch implementation differences here, >> but what AMD is doing violates the semantics of "retired". >> >> Is this behavior normal by design ? >> I'm not sure what I'm missing, this behavior is reinforced in zen4 as you said. >>