Re: [PATCH] kvm ignores ignore_msrs=1 VETO for some MSRs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 05, 2023, Jari Ruusu wrote:
> This problem is old regression. This type of setup worked fine on older
> linux-4.x hosts but fails on linux-5.10.x hosts. I remember seeing this fail
> as early as year 2021. I just haven't had time to look at it earlier.
> 
> Relevant qemu parameters:
>   -machine pc-1.0
>   -cpu Skylake-Server-IBRS,+md-clear,+pcid,+invpcid,+ssbd,+clflushopt
>   -enable-kvm
> If I change CPU model to "Nehalem" then it boots OK.
> 
> KVM stuff is built-in to host kernel and my kernel boot parameters include:
>   kvm-intel.ept=0 l1tf=off kvm.ignore_msrs=1
> so any invalid RDMSR reads should not fail because of ignore_msrs=1 VETO,
> but at least MSR_IA32_PERF_CAPABILITIES RDMSR read does indeed fail.

No, as documented in Documentation/admin-guide/kernel-parameters.txt, ignore_msrs
only applies to _unhandled_ MSRs, i.e. MSRs that KVM knows nothing about.

  kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.

The reason this introduces a failure in your setup is that KVM didn't have any
handling for MSR_IA32_PERF_CAPABILITIES prior to commit 27461da31089 ("KVM: x86/pmu:
Support full width counting"). 

> Full C-language source file can be viewed here:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/kernel/cpu/perf_event_intel.c?h=linux-3.10.y#n2023
> 
> My understanding of this failure is that it is combination of many factors,
> including:
> 
> 1) Qemu version is old
> 2) Qemu guest CPUID flags may be "Frankenstein" 

It's a bit Frankenstein, but architecturally it's completely valid.

> 3) old linux-3.10.108 x86_64 kernel may be doing something questionable

The guest kernel is the real culprit.  It is assuming that an MSR exists based on
the PMU version instead of checking the CPUID feature flag that enumerates the
existence of the MSR.

The bug was fixed almost a decade ago, but that fix obviously didn't make it to
the 3.10 kernel.

commit c9b08884c9c98929ec2d8abafd78e89062d01ee7
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date:   Mon Feb 3 14:29:03 2014 +0100

    perf/x86: Correctly use FEATURE_PDCM
    
    The current code simply assumes Intel Arch PerfMon v2+ to have
    the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check
    CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead.
    
    This was found by KVM which implements v2+ but didn't provide the
    capabilities MSR. Change the code to DTRT; KVM will also implement the
    MSR and return 0.


> 4) newer host linux KVM is not always honoring RDMSR ignore_msrs=1 VETO
> 
> My reading linux-5.10.194 kernel source identified following questionable
> handling ignore_msrs=1 VETO. This same problem appears to be present in
> recently released linux-6.5 too, but so far I have not tested this
> with linux-6.5.x host kernels yet.

While this is arguably a regression, this isn't going to be addressed in KVM.

ignore_msrs is off by default, and is explicitly documented as applying only to
unhandled MSRs.  The documentation could certainly do a better job of explaining
the potential pitfalls and long-term consequences of enabling ignore_msrs, but
hack-a-fixing this one MSR to fudge around a guest bug isn't going to happen,
and a broad "ignore all RDMSR/WRMSR faults" knob would likely break other guests,
e.g. would make it impossible to probe for MSR existence, and so such a knob would
be unusable.

As for working around this in your setup, assuming you don't actually need a
virtual PMU in the guest, the simplest workaround would be to turn off vPMU
support in KVM, i.e. boot with kvm.enable_pmu=0.  That _should_ cause QEMU to not
advertise a PMU to the guest.  Alternatively, if supported by QEMU, you could try
enumerating a version 1 vPMU to the guest.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux