Re: [RFC PATCH v3 26/58] KVM: x86/pmu: Manage MSR interception for IA32_PERF_GLOBAL_CTRL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 01, 2024, Mingwei Zhang wrote:
> ---
>  arch/x86/include/asm/vmx.h |   1 +
>  arch/x86/kvm/vmx/vmx.c     | 117 +++++++++++++++++++++++++++++++------
>  arch/x86/kvm/vmx/vmx.h     |   3 +-
>  3 files changed, 103 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index d77a31039f24..5ed89a099533 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -106,6 +106,7 @@
>  #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>  #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
>  #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
> +#define VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL      0x40000000

Please add a helper in capabilities.h:

static inline bool cpu_has_save_perf_global_ctrl(void)
{
	return vmcs_config.vmexit_ctrl & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL;
}


>  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 339742350b7a..34a420fa98c5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4394,6 +4394,97 @@ static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx)
>  	return pin_based_exec_ctrl;
>  }
>  
> +static void vmx_set_perf_global_ctrl(struct vcpu_vmx *vmx)

This is a misleading and inaccurate name.  It does far more than "set" PERF_GLOBAL_CTRL,
it arguably doesn't ever "set" the MSR, and it gets the VMWRITE for the guest field
wrong too.

> +{
> +	u32 vmentry_ctrl = vm_entry_controls_get(vmx);
> +	u32 vmexit_ctrl = vm_exit_controls_get(vmx);
> +	struct vmx_msrs *m;
> +	int i;
> +
> +	if (cpu_has_perf_global_ctrl_bug() ||

Note, cpu_has_perf_global_ctrl_bug() broken and needs to be purged:
https://lore.kernel.org/all/20241119011433.1797921-1-seanjc@xxxxxxxxxx

Note #2, as mentioned earlier, the mediated PMU should take a hard depenency on
the load/save controls.

On to this code, it fails to enable the load/save controls, e.g. if userspace
does KVM_SET_CPUID2 without a PMU, then KVM_SET_CPUID2 with a PMU.  In that case,
KVM will fail to set the control bits, and will fallback to the slow MSR load/save
lists.

With all of the above and other ideas combined, something like so:

	bool set = enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl();

	vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, set);
	vm_exit_controls_changebit(vmx,
				   VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
				   VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, set);


And I vote to put this in intel_pmu_refresh(); that avoids needing to figure out
a name for the helper, while giving more flexibililty on the local variable name.

Oh!  Definitely put it in intel_pmu_refresh(), because the RDPMC and MSR
interception logic needs to be there.  E.g. toggling CPU_BASED_RDPMC_EXITING
based solely on CPUID won't do the right thing if KVM ends up making the behavior
depend on PERF_CAPABILITIES.

Ditto for MSRs.  Though until my patch/series that drops kvm_pmu_refresh() from
kvm_pmu_init() lands[*], trying to update MSR intercepts during refresh() will hit
a NULL pointer deref as it's currently called before vmcs01 is allocated :-/

I expect to land that series before mediated PMU, but I don't think it makes sense
to take an explicit dependency for this series.  To fudge around the issue, maybe
do this for the next version?

static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
{
	__intel_pmu_refresh(vcpu);

	/*
	 * FIXME: Drop the MSR bitmap check if/when kvm_pmu_init() no longer
	 *        calls kvm_pmu_refresh(), i.e. when KVM refreshes the PMU only
	 *        after vmcs01 is allocated.
	 */
	if (to_vmx(vcpu)->vmcs01.msr_bitmap)
		intel_update_msr_intercepts(vcpu);

	vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
				    enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl());

	vm_exit_controls_changebit(vmx,
				   VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
				   VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL,
				   enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl());
}

or with a local variable for "enable_mediated_pmu && kvm_pmu_has_perf_global_ctrl()".
I can't come up with a decent name. :-)

[*] https://lore.kernel.org/all/20240517173926.965351-10-seanjc@xxxxxxxxxx

> +	    !is_passthrough_pmu_enabled(&vmx->vcpu)) {
> +		vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> +		vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> +		vmexit_ctrl &= ~VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL;
> +	}
> +
> +	if (is_passthrough_pmu_enabled(&vmx->vcpu)) {
> +		/*
> +		 * Setup auto restore guest PERF_GLOBAL_CTRL MSR at vm entry.
> +		 */
> +		if (vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) {
> +			vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, 0);

This incorrectly clobbers the guest's value.  A simple way to handle this is to
always propagate writes to PERF_GLOBAL_CTRL to the VMCS, if the write is allowed
and enable_mediated_pmu.  I.e. ensure GUEST_IA32_PERF_GLOBAL_CTRL is up-to-date
regardless of whether or not it's configured to be loaded.  Then there's no need
to write it here.

> +		} else {
> +			m = &vmx->msr_autoload.guest;
> +			i = vmx_find_loadstore_msr_slot(m, MSR_CORE_PERF_GLOBAL_CTRL);
> +			if (i < 0) {
> +				i = m->nr++;
> +				vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr);
> +			}
> +			m->val[i].index = MSR_CORE_PERF_GLOBAL_CTRL;
> +			m->val[i].value = 0;
> +		}
> +		/*
> +		 * Setup auto clear host PERF_GLOBAL_CTRL msr at vm exit.
> +		 */
> +		if (vmexit_ctrl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) {
> +			vmcs_write64(HOST_IA32_PERF_GLOBAL_CTRL, 0);

This should be unnecessary.  KVM should clear HOST_IA32_PERF_GLOBAL_CTRL in
vmx_set_constant_host_state() if enable_mediated_pmu is true.  Arguably, it might
make sense to clear it unconditionally, but with a comment explaining that it's
only actually constant for the mediated PMU.

And if the mediated PMU requires the VMCS knobs, then all of the load/store list
complexity goes away.

>  static u32 vmx_vmentry_ctrl(void)
>  {
>  	u32 vmentry_ctrl = vmcs_config.vmentry_ctrl;
> @@ -4401,17 +4492,10 @@ static u32 vmx_vmentry_ctrl(void)
>  	if (vmx_pt_mode_is_system())
>  		vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP |
>  				  VM_ENTRY_LOAD_IA32_RTIT_CTL);
> -	/*
> -	 * IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically.
> -	 */
> -	vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
> -			  VM_ENTRY_LOAD_IA32_EFER |
> -			  VM_ENTRY_IA32E_MODE);
> -
> -	if (cpu_has_perf_global_ctrl_bug())
> -		vmentry_ctrl &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> -
> -	return vmentry_ctrl;
> +	 /*
> +	  * IA32e mode, and loading of EFER is toggled dynamically.
> +	  */
> +	return vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_IA32E_MODE);

With my above suggestion, these changes are unnecessary.  If enable_mediated_pmu
is false, or the vCPU doesn't have a PMU, clearing the controls is correct.  And
when the vCPU is gifted a PMU, KVM will explicitly enabled the controls.

To discourage incorrect usage of these helpers maybe rename them to
vmx_get_initial_{vmentry,vmexit}_ctrl()?

>  }
>  
>  static u32 vmx_vmexit_ctrl(void)
> @@ -4429,12 +4513,8 @@ static u32 vmx_vmexit_ctrl(void)
>  		vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP |
>  				 VM_EXIT_CLEAR_IA32_RTIT_CTL);
>  
> -	if (cpu_has_perf_global_ctrl_bug())
> -		vmexit_ctrl &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> -
> -	/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
> -	return vmexit_ctrl &
> -		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);

But this code needs to *add* clearing of VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux