On Thu, Aug 01, 2024, Mingwei Zhang wrote: > --- > arch/x86/include/asm/vmx.h | 1 + > arch/x86/kvm/vmx/vmx.c | 117 +++++++++++++++++++++++++++++++------ > arch/x86/kvm/vmx/vmx.h | 3 +- > 3 files changed, 103 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > index d77a31039f24..5ed89a099533 100644 > --- a/arch/x86/include/asm/vmx.h > +++ b/arch/x86/include/asm/vmx.h > @@ -106,6 +106,7 @@ > #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 > #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 > #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 > +#define VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL 0x40000000 Please add a helper in capabilities.h: static inline bool cpu_has_save_perf_global_ctrl(void) { return vmcs_config.vmexit_ctrl & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL; } > #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 339742350b7a..34a420fa98c5 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -4394,6 +4394,97 @@ static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx) > return pin_based_exec_ctrl; > } > > +static void vmx_set_perf_global_ctrl(struct vcpu_vmx *vmx) This is a misleading and inaccurate name. It does far more than "set" PERF_GLOBAL_CTRL, it arguably doesn't ever "set" the MSR, and it gets the VMWRITE for the guest field wrong too. > +{ > + u32 vmentry_ctrl = vm_entry_controls_get(vmx); > + u32 vmexit_ctrl = vm_exit_controls_get(vmx); > + struct vmx_msrs *m; > + int i; > + > + if (cpu_has_perf_global_ctrl_bug() || Note, cpu_has_perf_global_ctrl_bug() broken and needs to be purged: https://lore.kernel.org/all/20241119011433.1797921-1-seanjc@xxxxxxxxxx Note #2, as mentioned earlier, the mediated PMU should take a hard depenency on the load/save controls. On to this code, it fails to enable the load/save controls, e.g. if userspace does KVM_SET_CPUID2 without a PMU, then KVM_SET_CPUID2 with a PMU. In that case, KVM will fail to set the control bits, and will fallback to the slow MSR load/save lists. With all of the above and other ideas combined, something like so: bool set = enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl(); vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, set); vm_exit_controls_changebit(vmx, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, set); And I vote to put this in intel_pmu_refresh(); that avoids needing to figure out a name for the helper, while giving more flexibililty on the local variable name. Oh! Definitely put it in intel_pmu_refresh(), because the RDPMC and MSR interception logic needs to be there. E.g. toggling CPU_BASED_RDPMC_EXITING based solely on CPUID won't do the right thing if KVM ends up making the behavior depend on PERF_CAPABILITIES. Ditto for MSRs. Though until my patch/series that drops kvm_pmu_refresh() from kvm_pmu_init() lands[*], trying to update MSR intercepts during refresh() will hit a NULL pointer deref as it's currently called before vmcs01 is allocated :-/ I expect to land that series before mediated PMU, but I don't think it makes sense to take an explicit dependency for this series. To fudge around the issue, maybe do this for the next version? static void intel_pmu_refresh(struct kvm_vcpu *vcpu) { __intel_pmu_refresh(vcpu); /* * FIXME: Drop the MSR bitmap check if/when kvm_pmu_init() no longer * calls kvm_pmu_refresh(), i.e. when KVM refreshes the PMU only * after vmcs01 is allocated. */ if (to_vmx(vcpu)->vmcs01.msr_bitmap) intel_update_msr_intercepts(vcpu); vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()); vm_exit_controls_changebit(vmx, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()); } or with a local variable for "enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()". I can't come up with a decent name. :-) [*] https://lore.kernel.org/all/20240517173926.965351-10-seanjc@xxxxxxxxxx > + !is_passthrough_pmu_enabled(&vmx->vcpu)) { > + vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; > + vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; > + vmexit_ctrl &= ~VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL; > + } > + > + if (is_passthrough_pmu_enabled(&vmx->vcpu)) { > + /* > + * Setup auto restore guest PERF_GLOBAL_CTRL MSR at vm entry. > + */ > + if (vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) { > + vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, 0); This incorrectly clobbers the guest's value. A simple way to handle this is to always propagate writes to PERF_GLOBAL_CTRL to the VMCS, if the write is allowed and enable_mediated_pmu. I.e. ensure GUEST_IA32_PERF_GLOBAL_CTRL is up-to-date regardless of whether or not it's configured to be loaded. Then there's no need to write it here. > + } else { > + m = &vmx->msr_autoload.guest; > + i = vmx_find_loadstore_msr_slot(m, MSR_CORE_PERF_GLOBAL_CTRL); > + if (i < 0) { > + i = m->nr++; > + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); > + } > + m->val[i].index = MSR_CORE_PERF_GLOBAL_CTRL; > + m->val[i].value = 0; > + } > + /* > + * Setup auto clear host PERF_GLOBAL_CTRL msr at vm exit. > + */ > + if (vmexit_ctrl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) { > + vmcs_write64(HOST_IA32_PERF_GLOBAL_CTRL, 0); This should be unnecessary. KVM should clear HOST_IA32_PERF_GLOBAL_CTRL in vmx_set_constant_host_state() if enable_mediated_pmu is true. Arguably, it might make sense to clear it unconditionally, but with a comment explaining that it's only actually constant for the mediated PMU. And if the mediated PMU requires the VMCS knobs, then all of the load/store list complexity goes away. > static u32 vmx_vmentry_ctrl(void) > { > u32 vmentry_ctrl = vmcs_config.vmentry_ctrl; > @@ -4401,17 +4492,10 @@ static u32 vmx_vmentry_ctrl(void) > if (vmx_pt_mode_is_system()) > vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP | > VM_ENTRY_LOAD_IA32_RTIT_CTL); > - /* > - * IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically. > - */ > - vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | > - VM_ENTRY_LOAD_IA32_EFER | > - VM_ENTRY_IA32E_MODE); > - > - if (cpu_has_perf_global_ctrl_bug()) > - vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; > - > - return vmentry_ctrl; > + /* > + * IA32e mode, and loading of EFER is toggled dynamically. > + */ > + return vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_IA32E_MODE); With my above suggestion, these changes are unnecessary. If enable_mediated_pmu is false, or the vCPU doesn't have a PMU, clearing the controls is correct. And when the vCPU is gifted a PMU, KVM will explicitly enabled the controls. To discourage incorrect usage of these helpers maybe rename them to vmx_get_initial_{vmentry,vmexit}_ctrl()? > } > > static u32 vmx_vmexit_ctrl(void) > @@ -4429,12 +4513,8 @@ static u32 vmx_vmexit_ctrl(void) > vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP | > VM_EXIT_CLEAR_IA32_RTIT_CTL); > > - if (cpu_has_perf_global_ctrl_bug()) > - vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; > - > - /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ > - return vmexit_ctrl & > - ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER); But this code needs to *add* clearing of VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL.