Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2024, Tom Lendacky wrote:
> On 12/10/24 14:33, Simon Pilkington wrote:
> > On 10/12/2024 16:47, Sean Christopherson wrote:
> >> Can you run with the below to see what bits the guest is trying to set (or clear)?
> >> We could get the same info via tracepoints, but this will likely be faster/easier.
> >>
> >> ---
> >>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
> >>  1 file changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >> index dd15cc635655..5144d0283c9d 100644
> >> --- a/arch/x86/kvm/svm/svm.c
> >> +++ b/arch/x86/kvm/svm/svm.c
> >> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >>  	case MSR_AMD64_DE_CFG: {
> >>  		u64 supported_de_cfg;
> >>  
> >> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
> >> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
> >>  			return 1;
> >>  
> >> -		if (data & ~supported_de_cfg)
> >> +		if (data & ~supported_de_cfg) {
> >> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
> >> +				supported_de_cfg, data);
> >>  			return 1;
> >> +		}
> >>  
> >>  		/*
> >>  		 * Don't let the guest change the host-programmed value.  The
> >> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >>  		 * are completely unknown to KVM, and the one bit known to KVM
> >>  		 * is simply a reflection of hardware capabilities.
> >>  		 */
> >> -		if (!msr->host_initiated && data != svm->msr_decfg)
> >> +		if (!msr->host_initiated && data != svm->msr_decfg) {
> >> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
> >> +				svm->msr_decfg, data);
> >>  			return 1;
> >> +		}
> >>  
> >>  		svm->msr_decfg = data;
> >>  		break;
> >>
> >> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> > 
> > Relevant dmesg output with some context below. VM locked up as expected.
> > 
> > [   85.834971] vfio-pci 0000:0c:00.0: resetting
> > [   85.937573] vfio-pci 0000:0c:00.0: reset done
> > [   86.494210] vfio-pci 0000:0c:00.0: resetting
> > [   86.494264] vfio-pci 0000:0c:00.1: resetting
> > [   86.761442] vfio-pci 0000:0c:00.0: reset done
> > [   86.761480] vfio-pci 0000:0c:00.1: reset done
> > [   86.762392] vfio-pci 0000:0c:00.0: resetting
> > [   86.865462] vfio-pci 0000:0c:00.0: reset done
> > [   86.977360] virbr0: port 1(vnet1) entered learning state
> > [   88.993052] virbr0: port 1(vnet1) entered forwarding state
> > [   88.993057] virbr0: topology change detected, propagating
> > [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
> > [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
> 
> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
> does change the behavior of LFENCE and isn't just a reflection of the
> hardware.
> 
> Linux does set that bit on boot, too (if LFENCE always serializing isn't
> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
> there.

Linux may be running afoul of this, but it would only become visible if someone
checked dmesg.  Even the "unsafe" MSR accesses in Linux gracefully handle faults
these days, the only symptom would be a WARN.

> I imagine that the above CPUID bit isn't set, so an attempt is made to
> set the MSR bit.

Yep.  And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
have failed.  Which means it's a-ok for the guest to set the bit, i.e. KVM won't
let the guest incorrectly think it's running on CPU for which LFENCE is serializing.

Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6a350cee2f6c..5a82ead3bf0f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
                if (data & ~supported_de_cfg)
                        return 1;
 
-               /*
-                * Don't let the guest change the host-programmed value.  The
-                * MSR is very model specific, i.e. contains multiple bits that
-                * are completely unknown to KVM, and the one bit known to KVM
-                * is simply a reflection of hardware capabilities.
-                */
-               if (!msr->host_initiated && data != svm->msr_decfg)
-                       return 1;
-
                svm->msr_decfg = data;
                break;
        }





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux