Re: [RFC PATCH v3 37/58] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/19/2024 9:46 AM, Sean Christopherson wrote:
> On Fri, Oct 25, 2024, Dapeng Mi wrote:
>> On 10/25/2024 4:26 AM, Chen, Zide wrote:
>>> On 7/31/2024 9:58 PM, Mingwei Zhang wrote:
>>>
>>>> @@ -7295,6 +7299,46 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
>>>>  					msrs[i].host, false);
>>>>  }
>>>>  
>>>> +static void save_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vmx *vmx)
>>>> +{
>>>> +	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
>>>> +	int i;
>>>> +
>>>> +	if (vm_exit_controls_get(vmx) & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL) {
>>>> +		pmu->global_ctrl = vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL);
>>> As commented in patch 26, compared with MSR auto save/store area
>>> approach, the exec control way needs one relatively expensive VMCS read
>>> on every VM exit.
>> Anyway, let us have a evaluation and data speaks.
> No, drop the unconditional VMREAD and VMWRITE, one way or another.  No benchmark
> will notice ~50 extra cycles, but if we write poor code for every feature, those
> 50 cycles per feature add up.
>
> Furthermore, checking to see if the CPU supports the load/save VMCS controls at
> runtime beyond ridiculous.  The mediated PMU requires ***VERSION 4***; if a CPU
> supports PMU version 4 and doesn't support the VMCS controls, KVM should yell and
> disable the passthrough PMU.  The amount of complexity added here to support a
> CPU that should never exist is silly.
>
>>>> +static void load_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vmx *vmx)
>>>> +{
>>>> +	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
>>>> +	u64 global_ctrl = pmu->global_ctrl;
>>>> +	int i;
>>>> +
>>>> +	if (vm_entry_controls_get(vmx) & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) {
>>>> +		vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, global_ctrl);
>>> ditto.
>>>
>>> We may optimize it by introducing a new flag pmu->global_ctrl_dirty and
>>> update GUEST_IA32_PERF_GLOBAL_CTRL only when it's needed.  But this
>>> makes the code even more complicated.
> I haven't looked at surrounding code too much, but I guarantee there's _zero_
> reason to eat a VMWRITE+VMREAD on every transition.  If, emphasis on *if*, KVM
> accesses PERF_GLOBAL_CTRL frequently, e.g. on most exits, then add a VCPU_EXREG_XXX
> and let KVM's caching infrastructure do the heavy lifting.  Don't reinvent the
> wheel.  But first, convince the world that KVM actually accesses the MSR somewhat
> frequently.

Sean, let me give more background here.

VMX supports two ways to save/restore PERF_GLOBAL_CTRL MSR, one is to
leverage VMCS_EXIT_CTRL/VMCS_ENTRY_CTRL to save/restore guest
PERF_GLOBAL_CTRL value to/from VMCS guest state. The other is to use the
VMCS MSR auto-load/restore bitmap to save/restore guest PERF_GLOBAL_CTRL. 

Currently we prefer to use the former way to save/restore guest
PERF_GLOBAL_CTRL as long as HW supports it. There is a limitation on the
MSR auto-load/restore feature. When there are multiple MSRs, the MSRs are
saved/restored in the order of MSR index. As the suggestion of SDM,
PERF_GLOBAL_CTRL should always be written at last after all other PMU MSRs
are manipulated. So if there are some PMU MSRs whose index is larger than
PERF_GLOBAL_CTRL (It would be true in archPerfmon v6+, all PMU MSRs in the
new MSR range have larger index than PERF_GLOBAL_CTRL), these PMU MSRs
would be restored after PERF_GLOBAL_CTRL. That would break the rule. Of
course, it's good to save/restore PERF_GLOBAL_CTRL right now with the VMCS
VMCS MSR auto-load/restore bitmap feature since only one PMU MSR
PERF_GLOBAL_CTRL is saved/restored in current implementation.

PERF_GLOBAL_CTRL MSR could be frequently accessed by perf/pmu driver, e.g.
on each task switch, so PERF_GLOBAL_CTRL MSR is configured to passthrough
to reduce the performance impact in mediated vPMU proposal if guest own all
PMU HW resource. But if guest only owns part of PMU HW resource,
PERF_GLOBAL_CTRL would be set to interception mode.

I suppose KVM doesn't need access PERF_GLOBAL_CTRL in passthrough mode.
This piece of code is intently just for PERF_GLOBAL_CTRL interception mode,
but think twice it looks unnecessary to save/restore PERF_GLOBAL_CTRL via
VMCS as KVM would always maintain the guest PERF_GLOBAL_CTRL value? Anyway,
this part of code can be optimized.


>
> I'll do a more thorough review of this series in the coming weeks (days?).  I
> singled out this one because I happened to stumble across the code when digging
> into something else.
Thanks, look forward to your more comments.






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux