On Tue, Oct 31, 2023 at 8:16 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > > > On 11/1/2023 10:47 AM, Jim Mattson wrote: > > On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > >> > >> On 11/1/2023 2:47 AM, Jim Mattson wrote: > >>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > >>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter > >>>> (fixed counter 3) to counter/sample topdown.slots event, but current > >>>> code still doesn't cover this new fixed counter. > >>>> > >>>> So this patch adds code to validate this new fixed counter can count > >>>> slots event correctly. > >>> I'm not convinced that this actually validates anything. > >>> > >>> Suppose, for example, that KVM used fixed counter 1 when the guest > >>> asked for fixed counter 3. Wouldn't this test still pass? > >> > >> Per my understanding, as long as the KVM returns a valid count in the > >> reasonable count range, we can think KVM works correctly. We don't need > >> to entangle on how KVM really uses the HW, it could be impossible and > >> unnecessary. > > Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers > > are in a reasonable range. What's everyone upset about? > > > >> Yeah, currently the predefined valid count range may be some kind of > >> loose since I want to cover as much as hardwares and avoid to cause > >> regression. Especially after introducing the random jump and clflush > >> instructions, the cycles and slots become much more hard to predict. > >> Maybe we can have a comparable restricted count range in the initial > >> change, and we can loosen the restriction then if we encounter a failure > >> on some specific hardware. do you think it's better? Thanks. > > I think the test is essentially useless, and should probably just be > > deleted, so that it doesn't give a false sense of confidence. > > IMO, I can't say the tests are totally useless. Yes, passing the tests > doesn't mean the KVM vPMU must work correctly, but we can say there is > something probably wrong if it fails to pass these tests. Considering > the hardware differences, it's impossible to set an exact value for > these events in advance and it seems there is no better method to verify > the PMC count as well. I still prefer to keep these tests until we have > a better method to verify the accuracy of the PMC count. If it's impossible to set an exact value for these events in advance, how does Intel validate the hardware PMU?