Re: [kvm-unit-tests Patch v2 4/5] x86: pmu: Support validation for Intel PMU fixed counter 3

Jim Mattson <jmattson@xxxxxxxxxx> · Tue, 31 Oct 2023 20:24:51 -0700

On Tue, Oct 31, 2023 at 8:16 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
>
>
> On 11/1/2023 10:47 AM, Jim Mattson wrote:
> > On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On 11/1/2023 2:47 AM, Jim Mattson wrote:
> >>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
> >>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
> >>>> (fixed counter 3) to counter/sample topdown.slots event, but current
> >>>> code still doesn't cover this new fixed counter.
> >>>>
> >>>> So this patch adds code to validate this new fixed counter can count
> >>>> slots event correctly.
> >>> I'm not convinced that this actually validates anything.
> >>>
> >>> Suppose, for example, that KVM used fixed counter 1 when the guest
> >>> asked for fixed counter 3. Wouldn't this test still pass?
> >>
> >> Per my understanding, as long as the KVM returns a valid count in the
> >> reasonable count range, we can think KVM works correctly. We don't need
> >> to entangle on how KVM really uses the HW, it could be impossible and
> >> unnecessary.
> > Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers
> > are in a reasonable range. What's everyone upset about?
> >
> >> Yeah, currently the predefined valid count range may be some kind of
> >> loose since I want to cover as much as hardwares and avoid to cause
> >> regression. Especially after introducing the random jump and clflush
> >> instructions, the cycles and slots become much more hard to predict.
> >> Maybe we can have a comparable restricted count range in the initial
> >> change, and we can loosen the restriction then if we encounter a failure
> >> on some specific hardware. do you think it's better? Thanks.
> > I think the test is essentially useless, and should probably just be
> > deleted, so that it doesn't give a false sense of confidence.
>
> IMO, I can't say the tests are totally useless. Yes,  passing the tests
> doesn't mean the KVM vPMU must work correctly, but we can say there is
> something probably wrong if it fails to pass these tests. Considering
> the hardware differences, it's impossible to set an exact value for
> these events in advance and it seems there is no better method to verify
> the PMC count as well. I still prefer to keep these tests until we have
> a better method to verify the accuracy of the PMC count.

If it's impossible to set an exact value for these events in advance,
how does Intel validate the hardware PMU?