Re: [RFC PATCH v3 37/58] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 18 Nov 2024 17:46:37 -0800

On Fri, Oct 25, 2024, Dapeng Mi wrote:
> 
> On 10/25/2024 4:26 AM, Chen, Zide wrote:
> >
> > On 7/31/2024 9:58 PM, Mingwei Zhang wrote:
> >
> >> @@ -7295,6 +7299,46 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> >>  					msrs[i].host, false);
> >>  }
> >>  
> >> +static void save_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vmx *vmx)
> >> +{
> >> +	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
> >> +	int i;
> >> +
> >> +	if (vm_exit_controls_get(vmx) & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL) {
> >> +		pmu->global_ctrl = vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL);
> > As commented in patch 26, compared with MSR auto save/store area
> > approach, the exec control way needs one relatively expensive VMCS read
> > on every VM exit.
> 
> Anyway, let us have a evaluation and data speaks.

No, drop the unconditional VMREAD and VMWRITE, one way or another.  No benchmark
will notice ~50 extra cycles, but if we write poor code for every feature, those
50 cycles per feature add up.

Furthermore, checking to see if the CPU supports the load/save VMCS controls at
runtime beyond ridiculous.  The mediated PMU requires ***VERSION 4***; if a CPU
supports PMU version 4 and doesn't support the VMCS controls, KVM should yell and
disable the passthrough PMU.  The amount of complexity added here to support a
CPU that should never exist is silly.

> >> +static void load_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vmx *vmx)
> >> +{
> >> +	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
> >> +	u64 global_ctrl = pmu->global_ctrl;
> >> +	int i;
> >> +
> >> +	if (vm_entry_controls_get(vmx) & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) {
> >> +		vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, global_ctrl);
> > ditto.
> >
> > We may optimize it by introducing a new flag pmu->global_ctrl_dirty and
> > update GUEST_IA32_PERF_GLOBAL_CTRL only when it's needed.  But this
> > makes the code even more complicated.

I haven't looked at surrounding code too much, but I guarantee there's _zero_
reason to eat a VMWRITE+VMREAD on every transition.  If, emphasis on *if*, KVM
accesses PERF_GLOBAL_CTRL frequently, e.g. on most exits, then add a VCPU_EXREG_XXX
and let KVM's caching infrastructure do the heavy lifting.  Don't reinvent the
wheel.  But first, convince the world that KVM actually accesses the MSR somewhat
frequently.

I'll do a more thorough review of this series in the coming weeks (days?).  I
singled out this one because I happened to stumble across the code when digging
into something else.