Re: [kvm-unit-tests PATCH v3 10/13] x86/pmu: Update testcases to cover Intel Arch PMU Version 1

Sandipan Das <sandipan.das@xxxxxxx> · Tue, 25 Oct 2022 14:04:55 +0530

Hi Like,

On 10/21/2022 1:02 PM, Like Xu wrote:
> Hi Sandipan,
> 
> On 19/9/2022 3:09 pm, Like Xu wrote:
>> On 8/9/2022 4:23 pm, Sandipan Das wrote:
>>> On 9/6/2022 7:05 PM, Like Xu wrote:
>>>> On 6/9/2022 4:16 pm, Sandipan Das wrote:
>>>>> Hi Like,
>>>>>
>>>>> On 8/19/2022 4:39 PM, Like Xu wrote:
>>>>>> From: Like Xu <likexu@xxxxxxxxxxx>
>>>>>>
>>>>>> For most unit tests, the basic framework and use cases which test
>>>>>> any PMU counter do not require any changes, except for two things:
>>>>>>
>>>>>> - No access to registers introduced only in PMU version 2 and above;
>>>>>> - Expanded tolerance for testing counter overflows
>>>>>>     due to the loss of uniform control of the gloabl_ctrl register
>>>>>>
>>>>>> Adding some pmu_version() return value checks can seamlessly support
>>>>>> Intel Arch PMU Version 1, while opening the door for AMD PMUs tests.
>>>>>>
>>>>>> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
>>>>>> ---
>>>>>>    x86/pmu.c | 64 +++++++++++++++++++++++++++++++++++++------------------
>>>>>>    1 file changed, 43 insertions(+), 21 deletions(-)
>>>>>>
>>>>>> [...]
>>>>>> @@ -327,13 +335,21 @@ static void check_counter_overflow(void)
>>>>>>                cnt.config &= ~EVNTSEL_INT;
>>>>>>            idx = event_to_global_idx(&cnt);
>>>>>>            __measure(&cnt, cnt.count);
>>>>>> -        report(cnt.count == 1, "cntr-%d", i);
>>>>>> +
>>>>>> +        report(check_irq() == (i % 2), "irq-%d", i);
>>>>>> +        if (pmu_version() > 1)
>>>>>> +            report(cnt.count == 1, "cntr-%d", i);
>>>>>> +        else
>>>>>> +            report(cnt.count < 4, "cntr-%d", i);
>>>>>> +
>>>>>> [...]
>>>>>
>>>>> Sorry I missed this in the previous response. With an upper bound of
>>>>> 4, I see this test failing some times for at least one of the six
>>>>> counters (with NMI watchdog disabled on the host) on a Milan (Zen 3)
>>>>> system. Increasing it further does reduce the probability but I still
>>>>> see failures. Do you see the same behaviour on systems with Zen 3 and
>>>>> older processors?
>>>>
>>>> A hundred runs on my machine did not report a failure.
>>>>
>>>
>>> Was this on a Zen 4 system?
>>>
>>>> But I'm not surprised by this, because some AMD platforms do
>>>> have hw PMU errata which requires bios or ucode fixes.
>>>>
>>>> Please help find the right upper bound for all your available AMD boxes.
>>>>
>>>
>>> Even after updating the microcode, the tests failed just as often in an
>>> overnight loop. However, upon closer inspection, the reason for failure
>>> was different. The variance is well within the bounds now but sometimes,
>>> is_the_count_reproducible() is true. Since this selects the original
>>> verification criteria (cnt.count == 1), the tests fail.
>>>
>>>> What makes me most nervous is that AMD's core hardware events run
>>>> repeatedly against the same workload, and their count results are erratic.
>>>>
>>>
>>> With that in mind, should we consider having the following change?
>>>
>>> diff --git a/x86/pmu.c b/x86/pmu.c
>>> index bb16b3c..39979b8 100644
>>> --- a/x86/pmu.c
>>> +++ b/x86/pmu.c
>>> @@ -352,7 +352,7 @@ static void check_counter_overflow(void)
>>>                  .ctr = gp_counter_base,
>>>                  .config = EVNTSEL_OS | EVNTSEL_USR | (*gp_events)[1].unit_sel /* instructions */,
>>>          };
>>> -       bool precise_event = is_the_count_reproducible(&cnt);
>>> +       bool precise_event = is_intel() ? is_the_count_reproducible(&cnt) : false;
>>>
>>>          __measure(&cnt, 0);
>>>          count = cnt.count;
>>>
>>> With this, the tests always pass. I will run another overnight loop and
>>> report back if I see any errors.
>>>
>>>> You may check is_the_count_reproducible() in the test case:
>>>> [1]https://lore.kernel.org/kvm/20220905123946.95223-7-likexu@xxxxxxxxxxx/>>>>
>>> On Zen 4 systems, this is always false and the overflow tests always
>>> pass irrespective of whether PerfMonV2 is enabled for the guest or not.
>>>
>>> - Sandipan
>>
>> I could change it to:
>>
>>          if (is_intel())
>>              report(cnt.count == 1, "cntr-%d", i);
>>          else
>>              report(cnt.count < 4, "cntr-%d", i);
> 
> On AMD (zen3/zen4) machines this seems to be the only way to ensure that the test cases don't fail:
> 
>         if (is_intel())
>             report(cnt.count == 1, "cntr-%d", i);
>         else
>             report(cnt.count == 0xffffffffffff || cnt.count < 7, "cntr-%d", i);
> 
> but it means some hardware counter defects, can you further confirm that this hardware behaviour
> is in line with your expectations ?
> 

I am yet to investigate as to why there would a variance in count but with this updated
test condition, I can confirm that the tests pass on all my systems.

- Sandipan

>>
>> but this does not explain the difference, that is for the same workload:
>>
>> if a retired hw event like "PMCx0C0 [Retired Instructions] (ExRetInstr)" is configured,
>> then it's expected to count "the number of instructions retired", the value is only relevant
>> for workload and it should remain the same over multiple measurements,
>>
>> but there are two hardware counters, one AMD and one Intel, both are reset to an identical value
>> (like "cnt.count = 1 - count"), and when they overflow, the Intel counter can stay exactly at 1,
>> while the AMD counter cannot.
>>
>> I know there are ulterior hardware micro-arch implementation differences here,
>> but what AMD is doing violates the semantics of "retired".
>>
>> Is this behavior normal by design ?
>> I'm not sure what I'm missing, this behavior is reinforced in zen4 as you said.
>>