On 2023-10-31 11:57 p.m., Mi, Dapeng wrote: > > On 11/1/2023 11:24 AM, Jim Mattson wrote: >> On Tue, Oct 31, 2023 at 8:16 PM Mi, Dapeng >> <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: >>> >>> On 11/1/2023 10:47 AM, Jim Mattson wrote: >>>> On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng >>>> <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: >>>>> On 11/1/2023 2:47 AM, Jim Mattson wrote: >>>>>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi >>>>>> <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: >>>>>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter >>>>>>> (fixed counter 3) to counter/sample topdown.slots event, but current >>>>>>> code still doesn't cover this new fixed counter. >>>>>>> >>>>>>> So this patch adds code to validate this new fixed counter can count >>>>>>> slots event correctly. >>>>>> I'm not convinced that this actually validates anything. >>>>>> >>>>>> Suppose, for example, that KVM used fixed counter 1 when the guest >>>>>> asked for fixed counter 3. Wouldn't this test still pass? >>>>> Per my understanding, as long as the KVM returns a valid count in the >>>>> reasonable count range, we can think KVM works correctly. We don't >>>>> need >>>>> to entangle on how KVM really uses the HW, it could be impossible and >>>>> unnecessary. >>>> Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers >>>> are in a reasonable range. What's everyone upset about? >>>> >>>>> Yeah, currently the predefined valid count range may be some kind of >>>>> loose since I want to cover as much as hardwares and avoid to cause >>>>> regression. Especially after introducing the random jump and clflush >>>>> instructions, the cycles and slots become much more hard to predict. >>>>> Maybe we can have a comparable restricted count range in the initial >>>>> change, and we can loosen the restriction then if we encounter a >>>>> failure >>>>> on some specific hardware. do you think it's better? Thanks. >>>> I think the test is essentially useless, and should probably just be >>>> deleted, so that it doesn't give a false sense of confidence. >>> IMO, I can't say the tests are totally useless. Yes, passing the tests >>> doesn't mean the KVM vPMU must work correctly, but we can say there is >>> something probably wrong if it fails to pass these tests. Considering >>> the hardware differences, it's impossible to set an exact value for >>> these events in advance and it seems there is no better method to verify >>> the PMC count as well. I still prefer to keep these tests until we have >>> a better method to verify the accuracy of the PMC count. >> If it's impossible to set an exact value for these events in advance, >> how does Intel validate the hardware PMU? > > > I have no much idea how HW team validates the PMU functionality. But per > my gotten information, they could have some very tiny benchmarks with a > fixed pattern and run them on a certain scenario, so they can expect an > very accurate count value. But this is different with our case, a real > program is executed on a real system (probably shared with other > programs), the events count is impacted by too much hardware/software > factors, such as cache contention, it's hard to predict a single > accurate count in advance. > Yes, there are many factors could impact the value of the microbenchmarks. I don't think there is a universal benchmark for all generations and all configurations. Thanks, Kan > Anyway, it's only my guess about the ways of hardware validation, still > add Kan to get more information. > > Hi Kan, > > Do you have more information about how HW team to validate the PMC count > accuracy? Thanks. > >