On Tue, Apr 04, 2023 at 03:25:27PM +0100, Mark Rutland wrote: > On Wed, Mar 29, 2023 at 01:03:18PM +0100, Marc Zyngier wrote: > > On Wed, 29 Mar 2023 01:21:36 +0100, > > Reiji Watanabe <reijiw@xxxxxxxxxx> wrote: > > > > > > Currently, with VHE, KVM sets ER, CR, SW and EN bits of > > > PMUSERENR_EL0 to 1 on vcpu_load(). So, if the value of those bits > > > are cleared after vcpu_load() (the perf subsystem would do when PMU > > > counters are programmed for the guest), PMU access from the guest EL0 > > > might be trapped to the guest EL1 directly regardless of the current > > > PMUSERENR_EL0 value of the vCPU. > > > > + RobH. > > > > Is that what is done when the event is created and armv8pmu_start() > > called? This is... crap. The EL0 access thing breaks everything, and > > nobody tested it with KVM, obviously. > > > > I would be tempted to start mitigating it with the following: > > > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > > index dde06c0f97f3..8063525bf3dd 100644 > > --- a/arch/arm64/kernel/perf_event.c > > +++ b/arch/arm64/kernel/perf_event.c > > @@ -806,17 +806,19 @@ static void armv8pmu_disable_event(struct perf_event *event) > > > > static void armv8pmu_start(struct arm_pmu *cpu_pmu) > > { > > - struct perf_event_context *ctx; > > - int nr_user = 0; > > + if (sysctl_perf_user_access) { > > + struct perf_event_context *ctx; > > + int nr_user = 0; > > > > - ctx = perf_cpu_task_ctx(); > > - if (ctx) > > - nr_user = ctx->nr_user; > > + ctx = perf_cpu_task_ctx(); > > + if (ctx) > > + nr_user = ctx->nr_user; > > > > - if (sysctl_perf_user_access && nr_user) > > - armv8pmu_enable_user_access(cpu_pmu); > > - else > > - armv8pmu_disable_user_access(); > > + if (nr_user) > > + armv8pmu_enable_user_access(cpu_pmu); > > + else > > + armv8pmu_disable_user_access(); > > + } > > > > /* Enable all counters */ > > armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E); > > > > but that's obviously not enough as we want it to work with EL0 access > > enabled on the host as well. > > > > What we miss is something that tells the PMU code "we're in a context > > where host userspace isn't present", and this would be completely > > skipped, relying on KVM to restore the appropriate state on > > vcpu_put(). But then the IPI stuff that controls EL0 can always come > > in and wreck things. Gahhh... > > AFAICT the perf code only writes to PMUSERENR_EL0 in contexts where IRQs (and > hence preemption) are disabled, so as long as we have a shadow of the host > PMUSERENR value somewhere, I think we can update the perf code with something > like the below? > > ... unless the KVM code is interruptible before saving the host value, or after > restoring it? Thank you for the suggestion. I will update my patch based on the suggestion. Thank you, Reiji > > Thanks, > Mark. > > ---->8---- > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > index dde06c0f97f3e..bdab3d5cbb5e3 100644 > --- a/arch/arm64/kernel/perf_event.c > +++ b/arch/arm64/kernel/perf_event.c > @@ -741,11 +741,26 @@ static inline u32 armv8pmu_getreset_flags(void) > return value; > } > > -static void armv8pmu_disable_user_access(void) > +static void update_pmuserenr(u64 val) > { > + lockdep_assert_irqs_disabled(); > + > + if (IS_ENABLED(CONFIG_KVM)) { > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > + if (vcpu) { > + vcpu->arch.pmuserenr_host = val; > + return; > + } > + } > + > write_sysreg(0, pmuserenr_el0); > } > > +static void armv8pmu_disable_user_access(void) > +{ > + update_pmuserenr(0); > +} > + > static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) > { > int i; > @@ -759,8 +774,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) > armv8pmu_write_evcntr(i, 0); > } > > - write_sysreg(0, pmuserenr_el0); > - write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0); > + update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR); > } > > static void armv8pmu_enable_event(struct perf_event *event) >