On Thu, Apr 11, 2024 at 2:44 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, Jan 26, 2024, Xiong Zhang wrote: > > From: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> > > > > Implement the save/restore of PMU state for pasthrough PMU in Intel. In > > passthrough mode, KVM owns exclusively the PMU HW when control flow goes to > > the scope of passthrough PMU. Thus, KVM needs to save the host PMU state > > and gains the full HW PMU ownership. On the contrary, host regains the > > ownership of PMU HW from KVM when control flow leaves the scope of > > passthrough PMU. > > > > Implement PMU context switches for Intel CPUs and opptunistically use > > rdpmcl() instead of rdmsrl() when reading counters since the former has > > lower latency in Intel CPUs. > > > > Co-developed-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > > Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> > > --- > > arch/x86/kvm/vmx/pmu_intel.c | 73 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 73 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > > index 0d58fe7d243e..f79bebe7093d 100644 > > --- a/arch/x86/kvm/vmx/pmu_intel.c > > +++ b/arch/x86/kvm/vmx/pmu_intel.c > > @@ -823,10 +823,83 @@ void intel_passthrough_pmu_msrs(struct kvm_vcpu *vcpu) > > > > static void intel_save_pmu_context(struct kvm_vcpu *vcpu) > > I would prefer there be a "guest" in there somewhere, e.g. intel_save_guest_pmu_context(). > > > { > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > + struct kvm_pmc *pmc; > > + u32 i; > > + > > + if (pmu->version != 2) { > > + pr_warn("only PerfMon v2 is supported for passthrough PMU"); > > + return; > > + } > > + > > + /* Global ctrl register is already saved at VM-exit. */ > > + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, pmu->global_status); > > + /* Clear hardware MSR_CORE_PERF_GLOBAL_STATUS MSR, if non-zero. */ > > + if (pmu->global_status) > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, pmu->global_status); > > + > > + for (i = 0; i < pmu->nr_arch_gp_counters; i++) { > > + pmc = &pmu->gp_counters[i]; > > + rdpmcl(i, pmc->counter); > > + rdmsrl(i + MSR_ARCH_PERFMON_EVENTSEL0, pmc->eventsel); > > + /* > > + * Clear hardware PERFMON_EVENTSELx and its counter to avoid > > + * leakage and also avoid this guest GP counter get accidentally > > + * enabled during host running when host enable global ctrl. > > + */ > > + if (pmc->eventsel) > > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0); > > + if (pmc->counter) > > + wrmsrl(MSR_IA32_PMC0 + i, 0); > > This doesn't make much sense. The kernel already has full access to the guest, > I don't see what is gained by zeroing out the MSRs just to hide them from perf. > > Similarly, if perf enables a counter if PERF_GLOBAL_CTRL without first restoring > the event selector, we gots problems. > > Same thing for the fixed counters below. Can't this just be? > > for (i = 0; i < pmu->nr_arch_gp_counters; i++) > rdpmcl(i, pmu->gp_counters[i].counter); > > for (i = 0; i < pmu->nr_arch_fixed_counters; i++) > rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, > pmu->fixed_counters[i].counter); > > > + } > > + > > + rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl); > > + /* > > + * Clear hardware FIXED_CTR_CTRL MSR to avoid information leakage and > > + * also avoid these guest fixed counters get accidentially enabled > > + * during host running when host enable global ctrl. > > + */ > > + if (pmu->fixed_ctr_ctrl) > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, 0); > > + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) { > > + pmc = &pmu->fixed_counters[i]; > > + rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, pmc->counter); > > + if (pmc->counter) > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 0); > > + } > > } > > > > static void intel_restore_pmu_context(struct kvm_vcpu *vcpu) > > { > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > + struct kvm_pmc *pmc; > > + u64 global_status; > > + int i; > > + > > + if (pmu->version != 2) { > > + pr_warn("only PerfMon v2 is supported for passthrough PMU"); > > + return; > > + } > > + > > + /* Clear host global_ctrl and global_status MSR if non-zero. */ > > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); > > Why? PERF_GLOBAL_CTRL will be auto-loaded at VM-Enter, why do it now? > > > + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, global_status); > > + if (global_status) > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, global_status); > > This seems especially silly, isn't the full MSR being written below? Or am I > misunderstanding how these things work? LOL! You expect CPU design to follow basic logic?!? Writing a 1 to a bit in IA32_PERF_GLOBAL_STATUS_SET sets the corresponding bit in IA32_PERF_GLOBAL_STATUS to 1. Writing a 0 to a bit in to IA32_PERF_GLOBAL_STATUS_SET is a nop. To clear a bit in IA32_PERF_GLOBAL_STATUS, you need to write a 1 to the corresponding bit in IA32_PERF_GLOBAL_STATUS_RESET (aka IA32_PERF_GLOBAL_OVF_CTRL). > > + wrmsrl(MSR_CORE_PERF_GLOBAL_STATUS_SET, pmu->global_status); > > + > > + for (i = 0; i < pmu->nr_arch_gp_counters; i++) { > > + pmc = &pmu->gp_counters[i]; > > + wrmsrl(MSR_IA32_PMC0 + i, pmc->counter); > > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, pmc->eventsel); > > + } > > + > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl); > > + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) { > > + pmc = &pmu->fixed_counters[i]; > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, pmc->counter); > > + } > > } > > > > struct kvm_pmu_ops intel_pmu_ops __initdata = { > > -- > > 2.34.1 > >