Hi Like, On 11/20/22 22:42, Like Xu wrote: > On 19/11/2022 8:28 pm, Dongli Zhang wrote: >> This patchset is to fix two svm pmu virtualization bugs. >> >> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization. >> >> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu >> virtualization. There is still below at the VM linux side ... > > Many QEMU vendor forks already have similar fixes, and > thanks for bringing this issue back to the mainline. Would you mind helping point to if there used to be any prior patchset for mainline to resolve the issue? > >> >> [ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver. >> >> ... although we expect something like below. >> >> [ 0.596381] Performance Events: PMU not available due to virtualization, >> using software events only. >> [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled >> >> The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE >> if the per-vcpu "pmu" property is disabled. >> >> I considered 'KVM_X86_SET_MSR_FILTER' initially. >> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I >> finally used the latter because it is easier to use. >> >> >> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset) >> at the KVM side may inject random unwanted/unknown NMIs to the VM. >> >> The svm pmu registers are not reset during QEMU system_reset. >> >> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it >> is running "perf top". The pmu registers are not disabled gracefully. >> >> (2). Although the x86_cpu_reset() resets many registers to zero, the >> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result, >> some pmu events are still enabled at the KVM side. >> >> (3). The KVM pmc_speculative_in_use() always returns true so that the events >> will not be reclaimed. The kvm_pmc->perf_event is still active. > > I'm not sure if you're saying KVM doing something wrong, I don't think so > because KVM doesn't sense the system_reset defined by QEME or other user space, > AMD's vPMC will continue to be enabled (if it was enabled before), generating pmi > injection into the guest, and the newly started guest doesn't realize the > counter is still > enabled and blowing up the error log. I were not saying KVM was buggy. I was trying to explain how the issue impacts KVM side and VM side. > >> >> (4). After the reboot, the VM kernel reports below error: >> >> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, >> complain to your hardware vendor. >> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR >> c0010200 is 530076) >> >> (5). In a worse case, the active kvm_pmc->perf_event is still able to >> inject unknown NMIs randomly to the VM kernel. >> >> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0. >> >> The patch 3 is to fix the issue by resetting AMD pmu registers as well as >> Intel registers. > > This fix idea looks good, it does require syncing the new changed device state > of QEMU to KVM. Thank you very much! Dongli Zhang > >> >> >> This patchset does cover does not cover PerfMonV2, until the below patchset >> is merged into the KVM side. >> >> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support >> https://lore.kernel.org/all/20221111102645.82001-1-likexu@xxxxxxxxxxx/ >> >> >> Dongli Zhang (3): >> kvm: introduce a helper before creating the 1st vcpu >> i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled >> target/i386/kvm: get and put AMD pmu registers >> >> accel/kvm/kvm-all.c | 7 ++- >> include/sysemu/kvm.h | 2 + >> target/arm/kvm64.c | 4 ++ >> target/i386/cpu.h | 5 +++ >> target/i386/kvm/kvm.c | 104 +++++++++++++++++++++++++++++++++++++++++++- >> target/mips/kvm.c | 4 ++ >> target/ppc/kvm.c | 4 ++ >> target/riscv/kvm.c | 4 ++ >> target/s390x/kvm/kvm.c | 4 ++ >> 9 files changed, 134 insertions(+), 4 deletions(-) >> >> Thank you very much! >> >> Dongli Zhang >> >> >>