On Sun, Sep 19, 2021 at 02:36:46PM +0100, Marc Zyngier wrote: > Hi Russell, > > Thanks for the heads-up. > > On Sat, 18 Sep 2021 17:17:45 +0100, > "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > This morning, I upgraded my VM host from Debian Buster to Debian > > Bullseye. This host (Armada 8040) runs libvirt. I placed all the VMs > > into managedsave, which basically means their state is saved out by > > QEMU ready to be resumed once the upgrade is complete. > > > > Initially, I was running 5.11 on the host, which didn't have POSIX > > ACLs enabled on tmpfs. Sadly, upgrading to Bullseye now requires > > this to be enabled, so I upgraded the kernel to 5.13 to avoid this > > problem - without POSIX ACLs on tmpfs, qemu refuses to even start. > > > > Under Bullseye with 5.13, I could start new VMs, but I could not > > resume the saved VMs - it would spit out: > > > > 2021-09-18T11:14:14.456227Z qemu-system-aarch64: Invalid value 236 expecting positive value <= 162 > > 2021-09-18T11:14:14.456269Z qemu-system-aarch64: Failed to load cpu:cpreg_vmstate_array_len > > 2021-09-18T11:14:14.456279Z qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu' > > 2021-09-18T11:14:14.456887Z qemu-system-aarch64: load of migration failed: Invalid argument > > > > Essentially, what this is complaining about is that the register > > list returned by the KVM_GET_REG_LIST ioctl has reduced in size, > > meaning that the saved VM has more registers that need to be set > > (236 of them, after QEMU's filtering) than those which are actually > > available under 5.13 (162 after QEMU's filtering). > > > > After much debugging, and writing a program to create a VM and CPU, > > and retrieve the register list etc, I have finally tracked down > > exactly what is going on... > > > > Under 5.13, KVM believes there is no PMU, so it doesn't advertise the > > PMU registers, despite the hardware having a PMU and the kernel > > having support. > > > > kvm_arm_support_pmu_v3() determines whether the PMU_V3 feature is > > available or not. > > > > Under 5.11, this was determined via: > > > > perf_num_counters() > 0 > > > > which in turn was derived from (essentially): > > > > __oprofile_cpu_pmu && __oprofile_cpu_pmu->num_events > 0 > > > > __oprofile_cpu_pmu can be set at any time when the PMU has been > > initialised, and due to initialisation ordering, it appears to happen > > much later in the kernel boot. > > > > However, under 5.13, oprofile has been removed, and this test is no > > longer present. Instead, kvm attempts to determine the availability > > of PMUv3 when it initialises, and fails to because the PMU has not > > yet been initialised. If there is no PMU at the point KVM initialises, > > then KVM will never advertise a PMU. > > > > This change of behaviour is what breaks KVM on Armada 8040, causing > > a userspace regression. > > > > What makes this more confusing is the PMU errors have gone: > > > > 5.13: > > [ 0.170514] PCI: CLS 0 bytes, default 64 > > [ 0.171085] kvm [1]: IPA Size Limit: 44 bits > > [ 0.172532] kvm [1]: vgic interrupt IRQ9 > > [ 0.172650] kvm: pmu event creation failed -2 > > [ 0.172688] kvm [1]: Hyp mode initialized successfully > > ... > > [ 1.479833] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available > > > > [...] > > Urgh. That's a bummer. T1he PMU driver only comes up once it has found > its interrupt controller, which on the Armada 8040 is not the GIC, but > some weird thing on the side that doesn't actually serve any real > purpose. On HW where the PMU is directly wired into the GIC, it all > works fine, though by luck rather than by design. > > Anyway, rant over. This is a bug that needs addressing so that KVM can > initialise correctly irrespective of the probing order. This probably > means that the static key controlling KVM's behaviour wrt the PMU must > be controlled by the PMU infrastructure itself, rather than KVM trying > to probe for it. > > Can you please give the following hack a go (on top of 5.15-rc1)? I've > briefly tested it on my McBin, and it did the trick. I've also tested > it on the M1 (which really doesn't have an architectural PMU) to > verify that it was correctly failing. > > Thanks, > > M. > > From 9c26e3e6bbcbc3a583b3974e7a9017029d31fe29 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <maz@xxxxxxxxxx> > Date: Sun, 19 Sep 2021 14:09:49 +0100 > Subject: [PATCH] KVM: arm64: Fix PMU probe ordering > > Russell reported that since 5.13, KVM's probing of the PMU has > started to fail on his HW. As it turns out, there is an implicit > ordering dependency between the architectural PMU probing code and > and KVM's own probing. If, due to probe ordering reasons, KVM probes > before the PMU driver, it will fail to detect the PMU and prevent it > from being advertised to guests as well as the VMM. > > Obviously, this is one probing too many, and we should be able to > deal with any ordering. > > Add a callback from the PMU code into KVM to advertise the registration > of a host CPU PMU, allowing for any probing order. > > Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers") > Reported-by: "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@xxxxxxxxxxxxxxxxxxxxx > Cc: stable@xxxxxxxxxxxxxxx > --- > arch/arm64/kvm/perf.c | 3 --- > arch/arm64/kvm/pmu-emul.c | 12 +++++++++++- > drivers/perf/arm_pmu.c | 2 ++ > include/kvm/arm_pmu.h | 3 --- > include/linux/perf/arm_pmu.h | 6 ++++++ > 5 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c > index f9bb3b14130e..c84fe24b2ea1 100644 > --- a/arch/arm64/kvm/perf.c > +++ b/arch/arm64/kvm/perf.c > @@ -50,9 +50,6 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = { > > int kvm_perf_init(void) > { > - if (kvm_pmu_probe_pmuver() != ID_AA64DFR0_PMUVER_IMP_DEF && !is_protected_kvm_enabled()) > - static_branch_enable(&kvm_arm_pmu_available); > - > return perf_register_guest_info_callbacks(&kvm_guest_cbs); > } > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index f5065f23b413..588100c52f34 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -740,7 +740,17 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > kvm_pmu_create_perf_event(vcpu, select_idx); > } > > -int kvm_pmu_probe_pmuver(void) > +void kvm_host_pmu_init(struct arm_pmu *pmu) > +{ > + if (pmu->pmuver != 0 && > + pmu->pmuver != ID_AA64DFR0_PMUVER_IMP_DEF && > + !is_protected_kvm_enabled()) { > + static_branch_enable(&kvm_arm_pmu_available); > + kvm_info("PMU detected and enabled\n"); > + } > +} > + > +static int kvm_pmu_probe_pmuver(void) > { > struct perf_event_attr attr = { }; > struct perf_event *event; > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c > index 3cbc3baf087f..295cc7952d0e 100644 > --- a/drivers/perf/arm_pmu.c > +++ b/drivers/perf/arm_pmu.c > @@ -952,6 +952,8 @@ int armpmu_register(struct arm_pmu *pmu) > pmu->name, pmu->num_events, > has_nmi ? ", using NMIs" : ""); > > + kvm_host_pmu_init(pmu); Just a nit, but I think this will get called for each PMU we probe on a big.LITTLE system which is probably harmless, but possible not what you want? Will