From: Oliver Upton <oupton@xxxxxxxxxx> Quirk KVM's misguided behavior of manipulating VMX MSRs based on guest CPUID state. There is no requirement, at all, that a CPU support virtualizing a feature if said feature is supported in bare metal. I.e. the VMX MSRs exist independent of CPUID for a reason. One could argue that disabling features, as KVM does for the entry/exit controls for the IA32_PERF_GLOBAL_CTRL and IA32_BNDCFGS MSRs, is correct as such a configuration is contradictory, but KVM's policy is to let userspace have full control of the guest vCPU model so long as the host kernel is not at risk. Furthermore, mucking with the VMX MSRs creates a subtle, difficult to maintain ABI as KVM must ensure that any internal changes, e.g. to how KVM handles _any_ guest CPUID changes, yield the same functional result. Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx> Co-developed-by: Sean Christopherson <seanjc@xxxxxxxxxx> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- Documentation/virt/kvm/api.rst | 21 +++++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/nested.c | 5 +++-- arch/x86/kvm/vmx/vmx.c | 3 ++- 5 files changed, 29 insertions(+), 4 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 42a1984fafc8..1095692ddab7 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7374,6 +7374,27 @@ The valid bits in cap.args[0] are: hypercall instructions. Executing the incorrect hypercall instruction will generate a #UD within the guest. + + KVM_X86_QUIRK_TWEAK_VMX_MSRS By default, during a guest CPUID update, + KVM adjusts the values of select VMX MSRs + (usually based on guest CPUID): + + - If CPUID.07H:EBX[bit 14] (MPX) is set, KVM + sets IA32_VMX_TRUE_ENTRY_CTLS[bit 48] + ('load IA32_BNDCFGS') and + IA32_VMX_TRUE_EXIT_CTLS[bit 55] + ('clear IA32_BNDCFGS'). Otherwise, these + corresponding MSR bits are cleared. + - If CPUID.0AH:EAX[bits 7:0] > 1, KVM sets + IA32_VMX_TRUE_ENTRY_CTLS[bit 45] + ('load IA32_PERF_GLOBAL_CTRL') and + IA32_VMX_TRUE_EXIT_CTLS[bit 44] + ('load IA32_PERF_GLOBAL_CTRL'). Otherwise, + these corresponding MSR bits are cleared. + + When this quirk is disabled, KVM will not + change the values of the aformentioned VMX + MSRs during guest CPUID updates. =================================== ============================================ 7.32 KVM_CAP_MAX_VCPU_ID diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6cf5d77d7896..a783c82fb902 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2011,6 +2011,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \ KVM_X86_QUIRK_OUT_7E_INC_RIP | \ KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \ - KVM_X86_QUIRK_FIX_HYPERCALL_INSN) + KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \ + KVM_X86_QUIRK_TWEAK_VMX_MSRS) #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 24c807c8d5f7..0705178bd93d 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -438,6 +438,7 @@ struct kvm_sync_regs { #define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3) #define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4) #define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5) +#define KVM_X86_QUIRK_TWEAK_VMX_MSRS (1 << 6) #define KVM_STATE_NESTED_FORMAT_VMX 0 #define KVM_STATE_NESTED_FORMAT_SVM 1 diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 4ba0e5540908..dc2f9b06b99a 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1301,8 +1301,9 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data) * To preserve an old, kludgy ABI, ensure KVM fiddling with the "true" * entry/exit controls MSRs is preserved after userspace modifications. */ - if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || - msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) + if ((msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS || + msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS) && + kvm_check_has_quirk(vmx->vcpu.kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS)) nested_vmx_entry_exit_ctls_update(&vmx->vcpu); return 0; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 73ec4746a4e6..4c31c8f24329 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7522,7 +7522,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) if (nested_vmx_allowed(vcpu)) { nested_vmx_cr_fixed1_bits_update(vcpu); - nested_vmx_entry_exit_ctls_update(vcpu); + if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_TWEAK_VMX_MSRS)) + nested_vmx_entry_exit_ctls_update(vcpu); } if (boot_cpu_has(X86_FEATURE_INTEL_PT) && -- 2.36.1.255.ge46751e96f-goog