On Wed, Aug 09, 2023 at 08:25:07AM +0000, Oliver Upton wrote: > Hi Huang, > > On Wed, Aug 09, 2023 at 09:39:53AM +0800, Huang Shijie wrote: > > 2.) Root cause. > > There is only 7 counters in my arm64 platform: > > (one cycle counter) + (6 normal counters) > > > > In 1.3 above, we will use 10 event counters. > > Since we only have 7 counters, the perf core will trigger > > event multiplexing in hrtimer: > > merge_sched_in() -->perf_mux_hrtimer_restart() --> > > perf_rotate_context(). > > > > In the perf_rotate_context(), it does not restore some PMU registers > > as context_switch() does. In context_switch(): > > kvm_sched_in() --> kvm_vcpu_pmu_restore_guest() > > kvm_sched_out() --> kvm_vcpu_pmu_restore_host() > > > > So we got wrong result. > > This is a rather vague description of the problem. AFAICT, the > issue here is on VHE systems we wind up getting the EL0 count > enable/disable bits backwards when entering the guest, which is > corroborated by the data you have below. Yep; IIUC the issue here is that when we take an IRQ from a guest and reprogram the PMU in the IRQ handler, the IRQ handler will program the PMU with appropriate host/guest/user/etc filters for a *host* context, and then we'll return back into the guest without reconfigurign the event filtering for a *guest* context. That can happen for perf_rotate_context(), or when we install an event into a running context, as that'll happen via an IPI. > > +void arch_perf_rotate_pmu_set(void) > > +{ > > + if (is_guest()) > > + kvm_vcpu_pmu_restore_guest(NULL); > > + else > > + kvm_vcpu_pmu_restore_host(NULL); > > +} > > + > > This sort of hook is rather nasty, and I'd strongly prefer a solution > that's confined to KVM. I don't think the !is_guest() branch is > necessary at all. Regardless of how the pmu context is changed, we need > to go through vcpu_put() before getting back out to userspace. > > We can check for a running vCPU (ick) from kvm_set_pmu_events() and either > do the EL0 bit flip there or make a request on the vCPU to call > kvm_vcpu_pmu_restore_guest() immediately before reentering the guest. > I'm slightly leaning towards the latter, unless anyone has a better idea > here. The latter sounds reasonable to me. I suspect we need to take special care here to make sure we leave *all* events in a good state when re-entering the guest or if we get to kvm_sched_out() after *removing* an event via an IPI -- it'd be easy to mess either case up and leave some events in a bad state. Thanks, Mark.