On Mon, May 29, 2023 at 7:51 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote: > > On 25/5/2023 5:32 am, Jim Mattson wrote: > > On Wed, May 24, 2023 at 2:29 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >> > >> On Wed, May 24, 2023, Jim Mattson wrote: > >>> On Wed, May 24, 2023 at 1:41 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >>>> > >>>> On Wed, Apr 26, 2023, Sandipan Das wrote: > >>>>> Hi Sean, Like, > >>>>> > >>>>> On 4/19/2023 7:11 PM, Like Xu wrote: > >>>>>>> Heh, it's very much explicable, it's just not desirable, and you and I would argue > >>>>>>> that it's also incorrect. > >>>>>> > >>>>>> This is completely inaccurate from the end guest pmu user's perspective. > >>>>>> > >>>>>> I have a toy that looks like virtio-pmu, through which guest users can get hypervisor performance data. > >>>>>> But the side effect of letting the guest see the VMRUN instruction by default is unacceptable, isn't it ? > >>>>>> > >>>>>>> > >>>>>>> AMD folks, are there plans to document this as an erratum?� I agree with Like that > >>>>>>> counting VMRUN as a taken branch in guest context is a CPU bug, even if the behavior > >>>>>>> is known/expected. > >>>>>> > >>>>> > >>>>> This behaviour is architectural and an erratum will not be issued. However, for clarity, a future > >>>>> release of the APM will include additional details like the following: > >>>>> > >>>>> 1) From the perspective of performance monitoring counters, VMRUNs are considered as far control > >>>>> transfers and VMEXITs as exceptions. > >>>>> > >>>>> 2) When the performance monitoring counters are set up to count events only in certain modes > >>>>> through the "OsUserMode" and "HostGuestOnly" bits, instructions and events that change the > >>>>> mode are counted in the target mode. For example, a SYSCALL from CPL 3 to CPL 0 with a > >>>>> counter set to count retired instructions with USR=1 and OS=0 will not cause an increment of > >>>>> the counter. However, the SYSRET back from CPL 0 to CPL 3 will cause an increment of the > >>>>> counter and the total count will end up correct. Similarly, when counting PMCx0C6 (retired > >>>>> far control transfers, including exceptions and interrupts) with Guest=1 and Host=0, a VMRUN > >>>>> instruction will cause an increment of the counter. However, the subsequent VMEXIT that occurs, > >>>>> since the target is in the host, will not cause an increment of the counter and so the total > >>>>> count will end up correct. > >>>> > >>>> The count from the guest's perspective does not "end up correct". Unlike SYSCALL, > >>>> where _userspace_ deliberately and synchronously executes a branch instruction, > >>>> VMEXIT and VMRUN are supposed to be transparent to the guest and can be completely > >>>> asynchronous with respect to guest code execution, e.g. if the host is spamming > >>>> IRQs, the guest will see a potentially large number of bogus (from it's perspective) > >>>> branches retired. > >>> > >>> The reverse problem occurs when a PMC is configured to count "CPUID > >>> instructions retired." Since KVM intercepts CPUID and emulates it, the > >>> PMC will always read 0, even if the guest executes a tight loop of > >>> CPUID instructions. > > Unlikely. KVM will count any emulated instructions based on kvm_pmu_incr_counter(). > Did I miss some conditions ? That code only increments PMCs configured to count "instructions retired" and "branch instructions retired." It does not increment PMCs configured to count "CPUID instructions retired." > >>> > >>> The PMU is not virtualizable on AMD CPUs without significant > >>> hypervisor corrections. I have to wonder if it's really worth the > >>> effort. > > I used to think so, until I saw the AMD64_EVENTSEL_GUESTONLY bit. > Hardware architects are expected to put more effort into this area. > > >> > >> Per our offlist chat, my understanding is that there are caveats with vPMUs that > >> it's simply not feasible for a hypervisor to handle. I.e. virtualizing any x86 > >> PMU with 100% accuracy isn't happening anytime soon. > > Indeed, and any more detailed complaints ? Reference cycles unhalted fails to increment outside of guest mode. SMIs received counts *physical* rather than virtual SMIs Interrupts taken counts *physical* rather than virtual interrupts taken. > >> > >> The way forward is likely to evaluate each caveat on a case-by-case basis to > >> determine whether or not the cost of the fixup in KVM is worth the benefit to > >> the guest. E.g. emulating "CPUID instructions retired" seems like it would be > >> fairly straightforward. AFAICT, fixing up the VMRUN stuff is quite difficult though. > > > > Yeah. The problem with fixing up "CPUID instructions retired" is > > tracking what the event encoding is for every F/M/S out there. It's > > not worth it. > > I don't think it's feasible to emulate 100% accuracy on Intel. For guest pmu > users, it is motivated by wanting to know how effective they are running on > the current pCPU, and any vPMU eimulation behavior that helps this > understanding would be valuable. But at least Intel has a list of architected events, which are mostly amenable to virtualization.