On Sat, Apr 13, 2024, Mi, Dapeng wrote: > > On 4/12/2024 5:44 AM, Sean Christopherson wrote: > > On Fri, Jan 26, 2024, Xiong Zhang wrote: > > > From: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> > > > > > > Implement the save/restore of PMU state for pasthrough PMU in Intel. In > > > passthrough mode, KVM owns exclusively the PMU HW when control flow goes to > > > the scope of passthrough PMU. Thus, KVM needs to save the host PMU state > > > and gains the full HW PMU ownership. On the contrary, host regains the > > > ownership of PMU HW from KVM when control flow leaves the scope of > > > passthrough PMU. > > > > > > Implement PMU context switches for Intel CPUs and opptunistically use > > > rdpmcl() instead of rdmsrl() when reading counters since the former has > > > lower latency in Intel CPUs. > > > > > > Co-developed-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > > > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > > > Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> > > > --- > > > arch/x86/kvm/vmx/pmu_intel.c | 73 ++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 73 insertions(+) > > > > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > > > index 0d58fe7d243e..f79bebe7093d 100644 > > > --- a/arch/x86/kvm/vmx/pmu_intel.c > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c > > > @@ -823,10 +823,83 @@ void intel_passthrough_pmu_msrs(struct kvm_vcpu *vcpu) > > > static void intel_save_pmu_context(struct kvm_vcpu *vcpu) > > I would prefer there be a "guest" in there somewhere, e.g. intel_save_guest_pmu_context(). > Yeah. It looks clearer. > > > > > { > > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > > + struct kvm_pmc *pmc; > > > + u32 i; > > > + > > > + if (pmu->version != 2) { > > > + pr_warn("only PerfMon v2 is supported for passthrough PMU"); > > > + return; > > > + } > > > + > > > + /* Global ctrl register is already saved at VM-exit. */ > > > + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, pmu->global_status); > > > + /* Clear hardware MSR_CORE_PERF_GLOBAL_STATUS MSR, if non-zero. */ > > > + if (pmu->global_status) > > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, pmu->global_status); > > > + > > > + for (i = 0; i < pmu->nr_arch_gp_counters; i++) { > > > + pmc = &pmu->gp_counters[i]; > > > + rdpmcl(i, pmc->counter); > > > + rdmsrl(i + MSR_ARCH_PERFMON_EVENTSEL0, pmc->eventsel); > > > + /* > > > + * Clear hardware PERFMON_EVENTSELx and its counter to avoid > > > + * leakage and also avoid this guest GP counter get accidentally > > > + * enabled during host running when host enable global ctrl. > > > + */ > > > + if (pmc->eventsel) > > > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, 0); > > > + if (pmc->counter) > > > + wrmsrl(MSR_IA32_PMC0 + i, 0); > > This doesn't make much sense. The kernel already has full access to the guest, > > I don't see what is gained by zeroing out the MSRs just to hide them from perf. > > It's necessary to clear the EVENTSELx MSRs for both GP and fixed counters. > Considering this case, Guest uses GP counter 2, but Host doesn't use it. So > if the EVENTSEL2 MSR is not cleared here, the GP counter 2 would be enabled > unexpectedly on host later since Host perf always enable all validate bits > in PERF_GLOBAL_CTRL MSR. That would cause issues. > > Yeah, the clearing for PMCx MSR should be unnecessary . > Why is clearing for PMCx MSR unnecessary? Do we want to leaking counter values to the host? NO. Not in cloud usage. Please make changes to this patch with **extreme** caution. According to our past experience, if there is a bug somewhere, there is a catch here (normally). Thanks. -Mingwei > > > > > Similarly, if perf enables a counter if PERF_GLOBAL_CTRL without first restoring > > the event selector, we gots problems. > > > > Same thing for the fixed counters below. Can't this just be? > > > > for (i = 0; i < pmu->nr_arch_gp_counters; i++) > > rdpmcl(i, pmu->gp_counters[i].counter); > > > > for (i = 0; i < pmu->nr_arch_fixed_counters; i++) > > rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, > > pmu->fixed_counters[i].counter); > > > > > + } > > > + > > > + rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl); > > > + /* > > > + * Clear hardware FIXED_CTR_CTRL MSR to avoid information leakage and > > > + * also avoid these guest fixed counters get accidentially enabled > > > + * during host running when host enable global ctrl. > > > + */ > > > + if (pmu->fixed_ctr_ctrl) > > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, 0); > > > + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) { > > > + pmc = &pmu->fixed_counters[i]; > > > + rdpmcl(INTEL_PMC_FIXED_RDPMC_BASE | i, pmc->counter); > > > + if (pmc->counter) > > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 0); > > > + } > > > } > > > static void intel_restore_pmu_context(struct kvm_vcpu *vcpu) > > > { > > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > > + struct kvm_pmc *pmc; > > > + u64 global_status; > > > + int i; > > > + > > > + if (pmu->version != 2) { > > > + pr_warn("only PerfMon v2 is supported for passthrough PMU"); > > > + return; > > > + } > > > + > > > + /* Clear host global_ctrl and global_status MSR if non-zero. */ > > > + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); > > Why? PERF_GLOBAL_CTRL will be auto-loaded at VM-Enter, why do it now? > > As previous comments say, host perf always enable all counters in > PERF_GLOBAL_CTRL by default. The reason to clear PERF_GLOBAL_CTRL here is to > ensure all counters in disabled state and the later counter manipulation > (writing MSR) won't cause any race condition or unexpected behavior on HW. > > > > > > > + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, global_status); > > > + if (global_status) > > > + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, global_status); > > This seems especially silly, isn't the full MSR being written below? Or am I > > misunderstanding how these things work? > > I think Jim's comment has already explain why we need to do this. > > > > > > + wrmsrl(MSR_CORE_PERF_GLOBAL_STATUS_SET, pmu->global_status); > > > + > > > + for (i = 0; i < pmu->nr_arch_gp_counters; i++) { > > > + pmc = &pmu->gp_counters[i]; > > > + wrmsrl(MSR_IA32_PMC0 + i, pmc->counter); > > > + wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + i, pmc->eventsel); > > > + } > > > + > > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, pmu->fixed_ctr_ctrl); > > > + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) { > > > + pmc = &pmu->fixed_counters[i]; > > > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, pmc->counter); > > > + } > > > } > > > struct kvm_pmu_ops intel_pmu_ops __initdata = { > > > -- > > > 2.34.1 > > >