On Fri, Jul 22, 2022, Paolo Bonzini wrote: > This reverts commit 03a8871add95213827e2bea84c12133ae5df952e. > > Since commit 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL > VM-{Entry,Exit} control"), KVM has taken ownership of the "load > IA32_PERF_GLOBAL_CTRL" VMX entry/exit control bits, trying to set these > bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID > supports the architectural PMU (CPUID[EAX=0Ah].EAX[7:0]=1), and clear > otherwise. > > This was a misguided attempt at mimicking what commit 5f76f6f5ff96 > ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled", > 2018-10-01) did for MPX. However, that commit was a workaround for > another KVM bug and not something that should be imitated. Mucking with > the VMX MSRs creates a subtle, difficult to maintain ABI as KVM must > ensure that any internal changes, e.g. to how KVM handles _any_ guest > CPUID changes, yield the same functional result. Therefore, KVM's policy > is to let userspace have full control of the guest vCPU model so long > as the host kernel is not at risk. > > And that's the snag: setting the bit must not cause any harm to the host, > therefore we need to be sure that the kvm_set_msr will actually succeed. () on functions please. > Furthermore, it is plausible to have a hypervisor that sets the controls > unconditionally and just leaves GUEST/HOST_IA32_PERF_GLOBAL_CTRL to 0, and > we don't want to regress that case. The simplest way to handle > both issues is to skip the call to kvm_set_msr if the value of > MSR_CORE_PERF_GLOBAL_CTRL is not changing. This covers trivially the case > where the PMU is not available and the only acceptable value of the MSR is > zero, because nonzero values are filtered in nested_vmx_check_host_state () > and nested_vmx_check_guest_state. Hmm, this is just trading one hack for another. The real problem is that KVM has a WARN that can be triggered by userspace sending a misconfigured vCPU model. And calling kvm_set_msr() iff the value is changing is trivial to exploit, e.g. userspace does KVM_SET_CPUID with a valid PMU and stuffs MSR_CORE_PERF_GLOBAL_CTRL to a non-zero value then does a "bad" KVM_SET_CPUID and a nested VM-Enter. An even more devious attack would be to do back-to-back KVM_SET_CPUID to get a valid pmu->global_ctrl_mask with pmu->version==0 (which is a bug in intel_pmu_refresh()) in order to bypass this check: if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && CC(!kvm_valid_perf_global_ctrl(vcpu_to_pmu(vcpu), vmcs12->guest_ia32_perf_global_ctrl))) return -EINVAL; and then nested VM-Enter with a non-zero guest_ia32_perf_global_ctrl. Blech, I was going to type up a suggested "flow", but untangling this requires multiple patches (there are multiple bugs). I'll just send a series with this as a pure revert of 03a8871add95213827e2bea84c12133ae5df952e as the last patch. > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/nested.c | 26 +++----------------------- > arch/x86/kvm/vmx/nested.h | 2 -- > arch/x86/kvm/vmx/pmu_intel.c | 3 --- > 3 files changed, 3 insertions(+), 28 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index c1c85fd75d42..6d25de9ebefa 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -2623,6 +2623,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > } > > if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && > + vmcs12->guest_ia32_perf_global_ctrl != vcpu_to_pmu(vcpu)->global_ctrl && > WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, > vmcs12->guest_ia32_perf_global_ctrl))) { > *entry_failure_code = ENTRY_FAIL_DEFAULT; > @@ -4333,7 +4334,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, > vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat); > vcpu->arch.pat = vmcs12->host_ia32_pat; > } > - if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) > + if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) > + && vmcs12->host_ia32_perf_global_ctrl != vcpu_to_pmu(vcpu)->global_ctrl) Should be a moot point, but put the "&&" on the first line. > WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, > vmcs12->host_ia32_perf_global_ctrl)); >