On Fri, Oct 25, 2024, Dapeng Mi wrote: > > On 10/25/2024 4:26 AM, Chen, Zide wrote: > > > > On 7/31/2024 9:58 PM, Mingwei Zhang wrote: > > > >> @@ -7295,6 +7299,46 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) > >> msrs[i].host, false); > >> } > >> > >> +static void save_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vmx *vmx) > >> +{ > >> + struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu); > >> + int i; > >> + > >> + if (vm_exit_controls_get(vmx) & VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL) { > >> + pmu->global_ctrl = vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL); > > As commented in patch 26, compared with MSR auto save/store area > > approach, the exec control way needs one relatively expensive VMCS read > > on every VM exit. > > Anyway, let us have a evaluation and data speaks. No, drop the unconditional VMREAD and VMWRITE, one way or another. No benchmark will notice ~50 extra cycles, but if we write poor code for every feature, those 50 cycles per feature add up. Furthermore, checking to see if the CPU supports the load/save VMCS controls at runtime beyond ridiculous. The mediated PMU requires ***VERSION 4***; if a CPU supports PMU version 4 and doesn't support the VMCS controls, KVM should yell and disable the passthrough PMU. The amount of complexity added here to support a CPU that should never exist is silly. > >> +static void load_perf_global_ctrl_in_passthrough_pmu(struct vcpu_vmx *vmx) > >> +{ > >> + struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu); > >> + u64 global_ctrl = pmu->global_ctrl; > >> + int i; > >> + > >> + if (vm_entry_controls_get(vmx) & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) { > >> + vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL, global_ctrl); > > ditto. > > > > We may optimize it by introducing a new flag pmu->global_ctrl_dirty and > > update GUEST_IA32_PERF_GLOBAL_CTRL only when it's needed. But this > > makes the code even more complicated. I haven't looked at surrounding code too much, but I guarantee there's _zero_ reason to eat a VMWRITE+VMREAD on every transition. If, emphasis on *if*, KVM accesses PERF_GLOBAL_CTRL frequently, e.g. on most exits, then add a VCPU_EXREG_XXX and let KVM's caching infrastructure do the heavy lifting. Don't reinvent the wheel. But first, convince the world that KVM actually accesses the MSR somewhat frequently. I'll do a more thorough review of this series in the coming weeks (days?). I singled out this one because I happened to stumble across the code when digging into something else.