On 11/20/2024 2:16 AM, Sean Christopherson wrote: > On Thu, Aug 01, 2024, Mingwei Zhang wrote: >> --- >> arch/x86/include/asm/vmx.h | 1 + >> arch/x86/kvm/vmx/vmx.c | 117 +++++++++++++++++++++++++++++++------ >> arch/x86/kvm/vmx/vmx.h | 3 +- >> 3 files changed, 103 insertions(+), 18 deletions(-) >> >> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h >> index d77a31039f24..5ed89a099533 100644 >> --- a/arch/x86/include/asm/vmx.h >> +++ b/arch/x86/include/asm/vmx.h >> @@ -106,6 +106,7 @@ >> #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 >> #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 >> #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 >> +#define VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL 0x40000000 > Please add a helper in capabilities.h: > > static inline bool cpu_has_save_perf_global_ctrl(void) > { > return vmcs_config.vmexit_ctrl & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL; > } Sure. > >> #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff >> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index 339742350b7a..34a420fa98c5 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -4394,6 +4394,97 @@ static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx) >> return pin_based_exec_ctrl; >> } >> >> +static void vmx_set_perf_global_ctrl(struct vcpu_vmx *vmx) > This is a misleading and inaccurate name. It does far more than "set" PERF_GLOBAL_CTRL, > it arguably doesn't ever "set" the MSR, and it gets the VMWRITE for the guest field > wrong too. > >> +{ >> + u32 vmentry_ctrl = vm_entry_controls_get(vmx); >> + u32 vmexit_ctrl = vm_exit_controls_get(vmx); >> + struct vmx_msrs *m; >> + int i; >> + >> + if (cpu_has_perf_global_ctrl_bug() || > Note, cpu_has_perf_global_ctrl_bug() broken and needs to be purged: > https://lore.kernel.org/all/20241119011433.1797921-1-seanjc@xxxxxxxxxx > > Note #2, as mentioned earlier, the mediated PMU should take a hard depenency on > the load/save controls. > > On to this code, it fails to enable the load/save controls, e.g. if userspace > does KVM_SET_CPUID2 without a PMU, then KVM_SET_CPUID2 with a PMU. In that case, > KVM will fail to set the control bits, and will fallback to the slow MSR load/save > lists. > > With all of the above and other ideas combined, something like so: > > bool set = enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl(); > > vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, set); > vm_exit_controls_changebit(vmx, > VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | > VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, set); > > > And I vote to put this in intel_pmu_refresh(); that avoids needing to figure out > a name for the helper, while giving more flexibililty on the local variable name. > > Oh! Definitely put it in intel_pmu_refresh(), because the RDPMC and MSR > interception logic needs to be there. E.g. toggling CPU_BASED_RDPMC_EXITING > based solely on CPUID won't do the right thing if KVM ends up making the behavior > depend on PERF_CAPABILITIES. > > Ditto for MSRs. Though until my patch/series that drops kvm_pmu_refresh() from > kvm_pmu_init() lands[*], trying to update MSR intercepts during refresh() will hit > a NULL pointer deref as it's currently called before vmcs01 is allocated :-/ > > I expect to land that series before mediated PMU, but I don't think it makes sense > to take an explicit dependency for this series. To fudge around the issue, maybe > do this for the next version? > > static void intel_pmu_refresh(struct kvm_vcpu *vcpu) > { > __intel_pmu_refresh(vcpu); > > /* > * FIXME: Drop the MSR bitmap check if/when kvm_pmu_init() no longer > * calls kvm_pmu_refresh(), i.e. when KVM refreshes the PMU only > * after vmcs01 is allocated. > */ > if (to_vmx(vcpu)->vmcs01.msr_bitmap) > intel_update_msr_intercepts(vcpu); > > vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, > enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()); > > vm_exit_controls_changebit(vmx, > VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | > VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, > enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()); > } > > or with a local variable for "enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()". > I can't come up with a decent name. :-) > > [*] https://lore.kernel.org/all/20240517173926.965351-10-seanjc@xxxxxxxxxx Sure. This looks better. > >> + !is_passthrough_pmu_enabled(&vmx->vcpu)) { >> + vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; >> + vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; >> + vmexit_ctrl &= ~VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL; >> + } >> + >> + if (is_passthrough_pmu_enabled(&vmx->vcpu)) { >> + /* >> + * Setup auto restore guest PERF_GLOBAL_CTRL MSR at vm entry. >> + */ >> + if (vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) { >> + vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, 0); > This incorrectly clobbers the guest's value. A simple way to handle this is to > always propagate writes to PERF_GLOBAL_CTRL to the VMCS, if the write is allowed > and enable_mediated_pmu. I.e. ensure GUEST_IA32_PERF_GLOBAL_CTRL is up-to-date > regardless of whether or not it's configured to be loaded. Then there's no need > to write it here. For mediated vPMU, I think we can move this into intel_pmu_refresh() as well, and just after the vm_entry_controls_changebit() helpers. the value should be supported gp counters mask which respects the behavior of PERF_GLOBAL_CTRL reset. > >> + } else { >> + m = &vmx->msr_autoload.guest; >> + i = vmx_find_loadstore_msr_slot(m, MSR_CORE_PERF_GLOBAL_CTRL); >> + if (i < 0) { >> + i = m->nr++; >> + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); >> + } >> + m->val[i].index = MSR_CORE_PERF_GLOBAL_CTRL; >> + m->val[i].value = 0; >> + } >> + /* >> + * Setup auto clear host PERF_GLOBAL_CTRL msr at vm exit. >> + */ >> + if (vmexit_ctrl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) { >> + vmcs_write64(HOST_IA32_PERF_GLOBAL_CTRL, 0); > This should be unnecessary. KVM should clear HOST_IA32_PERF_GLOBAL_CTRL in > vmx_set_constant_host_state() if enable_mediated_pmu is true. Arguably, it might > make sense to clear it unconditionally, but with a comment explaining that it's > only actually constant for the mediated PMU. Sure. would move this clearing into vmx_set_constant_host_state(). Yeah, I suppose it can be cleared unconditionally since VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL bit never be set for legacy emulated vPMU. > > And if the mediated PMU requires the VMCS knobs, then all of the load/store list > complexity goes away. > >> static u32 vmx_vmentry_ctrl(void) >> { >> u32 vmentry_ctrl = vmcs_config.vmentry_ctrl; >> @@ -4401,17 +4492,10 @@ static u32 vmx_vmentry_ctrl(void) >> if (vmx_pt_mode_is_system()) >> vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP | >> VM_ENTRY_LOAD_IA32_RTIT_CTL); >> - /* >> - * IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically. >> - */ >> - vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | >> - VM_ENTRY_LOAD_IA32_EFER | >> - VM_ENTRY_IA32E_MODE); >> - >> - if (cpu_has_perf_global_ctrl_bug()) >> - vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; >> - >> - return vmentry_ctrl; >> + /* >> + * IA32e mode, and loading of EFER is toggled dynamically. >> + */ >> + return vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_IA32E_MODE); > With my above suggestion, these changes are unnecessary. If enable_mediated_pmu > is false, or the vCPU doesn't have a PMU, clearing the controls is correct. And > when the vCPU is gifted a PMU, KVM will explicitly enabled the controls. Yes. > > To discourage incorrect usage of these helpers maybe rename them to > vmx_get_initial_{vmentry,vmexit}_ctrl()? Sure. > >> } >> >> static u32 vmx_vmexit_ctrl(void) >> @@ -4429,12 +4513,8 @@ static u32 vmx_vmexit_ctrl(void) >> vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP | >> VM_EXIT_CLEAR_IA32_RTIT_CTL); >> >> - if (cpu_has_perf_global_ctrl_bug()) >> - vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; >> - >> - /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ >> - return vmexit_ctrl & >> - ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER); > But this code needs to *add* clearing of VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL. Sure.