On Thu, Mar 03, 2022, Jim Mattson wrote: > On Thu, Mar 3, 2022 at 8:15 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Thu, Mar 03, 2022, Paolo Bonzini wrote: > > > On 3/3/22 02:43, Sean Christopherson wrote: > > > > > Maybe I can redirect you to a test case to highlight a possible > > > > > regression in KVM, as seen by userspace;-) > > > > Regressions aside, VMCS controls are not tied to CPUID, KVM should not be mucking > > > > with unrelated things. The original hack was to fix a userspace bug and should > > > > never have been mreged. > > > > > > Note that it dates back to: > > > > > > commit 5f76f6f5ff96587af5acd5930f7d9fea81e0d1a8 > > > Author: Liran Alon <liran.alon@xxxxxxxxxx> > > > Date: Fri Sep 14 03:25:52 2018 +0300 > > > > > > KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled > > > Before this commit, KVM exposes MPX VMX controls to L1 guest only based > > > on if KVM and host processor supports MPX virtualization. > > > However, these controls should be exposed to guest only in case guest > > > vCPU supports MPX. > > > > > > It's not to fix a userspace bug, it's to support userspace that doesn't > > > know about using KVM_SET_MSR for VMX features---which is okay since unlike > > > KVM_SET_CPUID2 it's not a mandatory call. > > > > I disagree, IMO failure to properly configure the vCPU model is a userspace bug. > > Maybe it was a userspace bug induced by a haphazard and/or poorly documented KVM > > ABI, but it's still a userspace bug. One could argue that KVM should disable/clear > > VMX features if userspace clears a related CPUID feature, but _setting_ a VMX > > feature based on CPUID is architecturally wrong. Even if we consider one or both > > cases to be desirable behavior in terms of creating a consistent vCPU model, forcing > > a consistent vCPU model for this one case goes against every other ioctl in KVM's > > ABI. > > > > If we consider it KVM's responsibility to propagate CPUID state to VMX MSRs, then > > KVM has a bunch of "bugs". > > > > X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA > > > > X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, > > SECONDARY_EXEC_TSC_SCALING > > > > X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID > > > > X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING > > > > X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, > > VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL > > > > X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES > > I don't disagree with you, but this does beg the question, "What's > going on with all of the invocations of cr4_fixed1_update()?" Boo, I forgot legal CR4 is controlled via MSRs too. Ha! That's a bug in nVMX. nVMX only checks msrs.cr4_fixed0/1, it doesn't check "cr4_reserved_bits", which is KVM's set of host reserved bits. That means userspace can bypass those reserved bits by setting guest CPUID and/or VMX MSRs and loading CR4 via VM-Enter/VM-Exit. The immediate nVMX bug can be fixed by calling kvm_is_valid_cr4(), which calls back into nVMX to do the VMX MSR checks. My vote would be to include nested_vmx_cr_fixed1_bits_update() in the quirk, but keep the guest CPUID enforcement that's in kvm_is_valid_cr4(). I.e. let userspace further restrict CR4, but don't let it allow nested VM-Enter/VM-Exit to load bits that L1 can't set via MOV CR4. I'll send this as a proper patch: diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h index c92cea0b8ccc..46dd1967ec08 100644 --- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -285,8 +285,8 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) } /* No difference in the restrictions on guest and host CR4 in VMX operation. */ -#define nested_guest_cr4_valid nested_cr4_valid -#define nested_host_cr4_valid nested_cr4_valid +#define nested_guest_cr4_valid kvm_is_valid_cr4 +#define nested_host_cr4_valid kvm_is_valid_cr4 extern struct kvm_x86_nested_ops vmx_nested_ops;