On Mon, 14 Aug 2023 03:58:32 +0100, Shijie Huang <shijie@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi Marc, > > 在 2023/8/12 2:05, Marc Zyngier 写道: > > Huang Shijie reports that, when profiling a guest from the host > > with a number of events that exceeds the number of available > > counters, the reported counts are wildly inaccurate. Without > > the counter oversubscription, the reported counts are correct. > > > > Their investigation indicates that upon counter rotation (which > > takes place on the back of a timer interrupt), we fail to > > re-apply the guest EL0 enabling, leading to the counting of host > > events instead of guest events. > > > > In order to solve this, add yet another hook between the host PMU > > driver and KVM, re-applying the guest EL0 configuration if the > > right conditions apply (the host is VHE, we are in interrupt > > context, and we interrupted a running vcpu). This triggers a new > > vcpu request which will apply the correct configuration on guest > > reentry. > > > > With this, we have the correct counts, even when the counters are > > oversubscribed. > > > > Reported-by: Huang Shijie <shijie@xxxxxxxxxxxxxxxxxxxxxx> > > Suggested-by: Oliver Upton <oliver.upton@xxxxxxxxx> > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > Cc: Mark Rutland <mark.rutland@xxxxxxx> > > Cc: Will Deacon <will@xxxxxxxxxx> > > Link: https://lore.kernel.org/r/20230809013953.7692-1-shijie@xxxxxxxxxxxxxxxxxxxxxx > > --- > > arch/arm64/include/asm/kvm_host.h | 1 + > > arch/arm64/kvm/arm.c | 3 +++ > > arch/arm64/kvm/pmu.c | 18 ++++++++++++++++++ > > drivers/perf/arm_pmuv3.c | 2 ++ > > include/kvm/arm_pmu.h | 2 ++ > > 5 files changed, 26 insertions(+) > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > index d3dd05bbfe23..553040e0e375 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -49,6 +49,7 @@ > > #define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4) > > #define KVM_REQ_RELOAD_PMU KVM_ARCH_REQ(5) > > #define KVM_REQ_SUSPEND KVM_ARCH_REQ(6) > > +#define KVM_REQ_RESYNC_PMU_EL0 KVM_ARCH_REQ(7) > > #define KVM_DIRTY_LOG_MANUAL_CAPS > > (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ > > KVM_DIRTY_LOG_INITIALLY_SET) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index 72dc53a75d1c..978b0411082f 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -803,6 +803,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu) > > kvm_pmu_handle_pmcr(vcpu, > > __vcpu_sys_reg(vcpu, PMCR_EL0)); > > + if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu)) > > + kvm_vcpu_pmu_restore_guest(vcpu); > > + > > if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) > > return kvm_vcpu_suspend(vcpu); > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > > index 121f1a14c829..0eea225fd09a 100644 > > --- a/arch/arm64/kvm/pmu.c > > +++ b/arch/arm64/kvm/pmu.c > > @@ -236,3 +236,21 @@ bool kvm_set_pmuserenr(u64 val) > > ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val; > > return true; > > } > > + > > +/* > > + * If we interrupted the guest to update the host PMU context, make > > + * sure we re-apply the guest EL0 state. > > + */ > > +void kvm_vcpu_pmu_resync_el0(void) > > +{ > > + struct kvm_vcpu *vcpu; > > + > > + if (!has_vhe() || !in_interrupt()) > > + return; > > + > > + vcpu = kvm_get_running_vcpu(); > > + if (!vcpu) > > + return; > > + > > + kvm_make_request(KVM_REQ_RESYNC_PMU_EL0, vcpu); > > +} > > diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c > > index 08b3a1bf0ef6..6a3d8176f54a 100644 > > --- a/drivers/perf/arm_pmuv3.c > > +++ b/drivers/perf/arm_pmuv3.c > > @@ -772,6 +772,8 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) > > /* Enable all counters */ > > armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E); > > + > > + kvm_vcpu_pmu_resync_el0(); > > } > > I read the perf code again, and find it maybe not good to do it in > armv8pmu_start. > > Assume we install a new perf event to a CPU "x" from CPU 0, a VM > guest is running on CPU "x": > > perf_event_open() --> perf_install_in_context() --> > > call this function in IPI interrupt: ___perf_install_in_context(). > > armv8pmu_start() will be called in ___perf_install_in_context() in IPI. > > so kvm_vcpu_pmu_resync_el0() will _make_ a request by meeting the > conditions: > > 1.) in interrupt context. > > 2.) a guest is running on this CPU. > > > But in actually, the request should not send out. Why shouldn't it be applied? This isn't supposed to be always necessary, but it needs to be applied whenever there is a possibility for counters to be updated behind our back, something that is a pretty event anyway. > Please correct me if I am wrong. > > IMHO, the best solution is add a hook in the perf/core code, and make > the request there. I disagree. I'm still completely opposed to anything that will add such a hook in the core perf code, specially as a weak symbol. The interactions must be strictly between the PMUv3 driver and KVM, because they are the only parties involved here. I will *not* take such a patch. M. -- Without deviation from the norm, progress is not possible.