Hi Huang, On Wed, Aug 09, 2023 at 09:39:53AM +0800, Huang Shijie wrote: > 2.) Root cause. > There is only 7 counters in my arm64 platform: > (one cycle counter) + (6 normal counters) > > In 1.3 above, we will use 10 event counters. > Since we only have 7 counters, the perf core will trigger > event multiplexing in hrtimer: > merge_sched_in() -->perf_mux_hrtimer_restart() --> > perf_rotate_context(). > > In the perf_rotate_context(), it does not restore some PMU registers > as context_switch() does. In context_switch(): > kvm_sched_in() --> kvm_vcpu_pmu_restore_guest() > kvm_sched_out() --> kvm_vcpu_pmu_restore_host() > > So we got wrong result. This is a rather vague description of the problem. AFAICT, the issue here is on VHE systems we wind up getting the EL0 count enable/disable bits backwards when entering the guest, which is corroborated by the data you have below. > +void arch_perf_rotate_pmu_set(void) > +{ > + if (is_guest()) > + kvm_vcpu_pmu_restore_guest(NULL); > + else > + kvm_vcpu_pmu_restore_host(NULL); > +} > + This sort of hook is rather nasty, and I'd strongly prefer a solution that's confined to KVM. I don't think the !is_guest() branch is necessary at all. Regardless of how the pmu context is changed, we need to go through vcpu_put() before getting back out to userspace. We can check for a running vCPU (ick) from kvm_set_pmu_events() and either do the EL0 bit flip there or make a request on the vCPU to call kvm_vcpu_pmu_restore_guest() immediately before reentering the guest. I'm slightly leaning towards the latter, unless anyone has a better idea here. -- Thanks, Oliver