Hi Alexandre, Thanks for looking into this. On Mon, 28 Jun 2021 17:19:25 +0100, Alexandre Chartre <alexandre.chartre@xxxxxxxxxx> wrote: > > In a KVM guest on ARM, performance counters interrupts have an nit: arm64. 32bit ARM never had any working KVM PMU emulation. > unnecessary overhead which slows down execution when using the "perf > record" command and limits the "perf record" sampling period. > > The problem is that when a guest VM disables counters by clearing the > PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in > PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0. > > KVM disables a counter by calling into the perf framework, in particular > by calling perf_event_create_kernel_counter() which is a time consuming > operation. So, for example, with a Neoverse N1 CPU core which has 6 event > counters and one cycle counter, KVM will always disable all 7 counters > even if only one is enabled. > > This typically happens when using the "perf record" command in a guest > VM: perf will disable all event counters with PMCNTENTSET_EL0 and only > uses the cycle counter. And when using the "perf record" -F option with > a high profiling frequency, the overhead of KVM disabling all counters > instead of one on every counter interrupt becomes very noticeable. > > The problem is fixed by having KVM disable only counters which are > enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0 > then KVM will not enable it when setting PMCR_EL0.E and it will remain > disable as long as it is not enabled in PMCNTENSET_EL0. So there is nit: disabled > effectively no need to disable a counter when clearing PMCR_EL0.E if it > is not enabled PMCNTENSET_EL0. > > Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits") This isn't a fix (the current behaviour is correct per the architecture), "only" a performance improvement. We reserve "Fixes:" for things that are actually broken. > Signed-off-by: Alexandre Chartre <alexandre.chartre@xxxxxxxxxx> > --- > arch/arm64/kvm/pmu-emul.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index fd167d4f4215..bab4b735a0cf 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -571,7 +571,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) > kvm_pmu_enable_counter_mask(vcpu, > __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); > } else { > - kvm_pmu_disable_counter_mask(vcpu, mask); > + kvm_pmu_disable_counter_mask(vcpu, > + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); This seems to perpetuate a flawed pattern. Why do we need to work out the *valid* PMCTENSET_EL0 bits? They should be correct by construction, and the way the shadow sysreg gets populated already enforces this: <quote> static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { [...] mask = kvm_pmu_valid_counter_mask(vcpu); if (p->is_write) { val = p->regval & mask; if (r->Op2 & 0x1) { /* accessing PMCNTENSET_EL0 */ __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val; kvm_pmu_enable_counter_mask(vcpu, val); kvm_vcpu_pmu_restore_guest(vcpu); </quote> So the sysreg is the only thing we should consider, and I think we should drop the useless masking. There is at least another instance of this in the PMU code (kvm_pmu_overflow_status()), and apart from kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the masking to sanitise accesses. What do you think? M. -- Without deviation from the norm, progress is not possible.