On Tue, Sep 05, 2023, Jari Ruusu wrote: > This problem is old regression. This type of setup worked fine on older > linux-4.x hosts but fails on linux-5.10.x hosts. I remember seeing this fail > as early as year 2021. I just haven't had time to look at it earlier. > > Relevant qemu parameters: > -machine pc-1.0 > -cpu Skylake-Server-IBRS,+md-clear,+pcid,+invpcid,+ssbd,+clflushopt > -enable-kvm > If I change CPU model to "Nehalem" then it boots OK. > > KVM stuff is built-in to host kernel and my kernel boot parameters include: > kvm-intel.ept=0 l1tf=off kvm.ignore_msrs=1 > so any invalid RDMSR reads should not fail because of ignore_msrs=1 VETO, > but at least MSR_IA32_PERF_CAPABILITIES RDMSR read does indeed fail. No, as documented in Documentation/admin-guide/kernel-parameters.txt, ignore_msrs only applies to _unhandled_ MSRs, i.e. MSRs that KVM knows nothing about. kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. The reason this introduces a failure in your setup is that KVM didn't have any handling for MSR_IA32_PERF_CAPABILITIES prior to commit 27461da31089 ("KVM: x86/pmu: Support full width counting"). > Full C-language source file can be viewed here: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/kernel/cpu/perf_event_intel.c?h=linux-3.10.y#n2023 > > My understanding of this failure is that it is combination of many factors, > including: > > 1) Qemu version is old > 2) Qemu guest CPUID flags may be "Frankenstein" It's a bit Frankenstein, but architecturally it's completely valid. > 3) old linux-3.10.108 x86_64 kernel may be doing something questionable The guest kernel is the real culprit. It is assuming that an MSR exists based on the PMU version instead of checking the CPUID feature flag that enumerates the existence of the MSR. The bug was fixed almost a decade ago, but that fix obviously didn't make it to the 3.10 kernel. commit c9b08884c9c98929ec2d8abafd78e89062d01ee7 Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Date: Mon Feb 3 14:29:03 2014 +0100 perf/x86: Correctly use FEATURE_PDCM The current code simply assumes Intel Arch PerfMon v2+ to have the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead. This was found by KVM which implements v2+ but didn't provide the capabilities MSR. Change the code to DTRT; KVM will also implement the MSR and return 0. > 4) newer host linux KVM is not always honoring RDMSR ignore_msrs=1 VETO > > My reading linux-5.10.194 kernel source identified following questionable > handling ignore_msrs=1 VETO. This same problem appears to be present in > recently released linux-6.5 too, but so far I have not tested this > with linux-6.5.x host kernels yet. While this is arguably a regression, this isn't going to be addressed in KVM. ignore_msrs is off by default, and is explicitly documented as applying only to unhandled MSRs. The documentation could certainly do a better job of explaining the potential pitfalls and long-term consequences of enabling ignore_msrs, but hack-a-fixing this one MSR to fudge around a guest bug isn't going to happen, and a broad "ignore all RDMSR/WRMSR faults" knob would likely break other guests, e.g. would make it impossible to probe for MSR existence, and so such a knob would be unusable. As for working around this in your setup, assuming you don't actually need a virtual PMU in the guest, the simplest workaround would be to turn off vPMU support in KVM, i.e. boot with kvm.enable_pmu=0. That _should_ cause QEMU to not advertise a PMU to the guest. Alternatively, if supported by QEMU, you could try enumerating a version 1 vPMU to the guest.