From: Oliver Upton <oupton@xxxxxxxxxx> Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"), KVM has taken ownership of the "load IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports MPX, and clear otherwise. However, commit aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") partially broke KVM ownership of the aforementioned bits. Before, kvm_update_cpuid() was exercised frequently when running a guest and constantly applied its own changes to the BNDCFGS bits. Now, the BNDCFGS bits are only ever updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a subsequent MSR write from userspace will clobber these values. Uphold the old ABI by reapplying KVM's tweaks to the BNDCFGS bits after an MSR write from userspace. Note, the old ABI that is being preserved is a KVM hack to workaround a userspace bug; see commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"). VMCS controls are not tied to CPUID, i.e. KVM should not be mucking with unrelated things. The argument that it's KVM's responsibility to propagate CPUID state to VMX MSRs doesn't hold water, as the MPX shenanigans are an exception, not the norm, e.g. KVM doesn't perform the following adjustments (and this is but a subset of all possible tweaks): X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, SECONDARY_EXEC_TSC_SCALING X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES Fixes: aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()") Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx> [sean: explicitly document the original KVM hack] Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/vmx/nested.c | 9 +++++++++ arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/vmx/vmx.h | 2 ++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index fca30e79b3a0..d1c21d387716 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1296,6 +1296,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data) vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp); *lowp = data; *highp = data >> 32; + + /* + * To preserve an old, kludgy ABI, ensure KVM fiddling with the "true" + * entry/exit controls MSRs is preserved after userspace modifications. + */ + if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) + nested_vmx_entry_exit_ctls_update(&vmx->vcpu); + return 0; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 57df799ffa29..3f1671d7cbe4 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7398,7 +7398,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) #undef cr4_fixed1_update } -static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 71bcb486e73f..576fed7e33de 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -425,6 +425,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and -- 2.36.1.255.ge46751e96f-goog