Re: [RFC PATCH v3 31/58] KVM: x86/pmu: Add counter MSR and selector MSR index into struct kvm_pmc

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 20 Nov 2024 09:30:33 -0800

On Wed, Nov 20, 2024, Dapeng Mi wrote:
> 
> On 11/20/2024 2:58 AM, Sean Christopherson wrote:
> > Please squash this with the patch that does the actual save/load.  Hmm, maybe it
> > should be put/load, now that I think about it more?  That's more consitent with
> > existing KVM terminology.
> 
> Sure. I ever noticed that this in-consistence, but "put" seem not so
> intuitionistic as "save", so didn't change it.

Yeah, "put" isn't perfect, but neither is "save", because the save/put path also
purges hardware state.

> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 4b3ce6194bdb..603727312f9c 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -522,6 +522,8 @@ struct kvm_pmc {
> >>  	 */
> >>  	u64 emulated_counter;
> >>  	u64 eventsel;
> >> +	u64 msr_counter;
> >> +	u64 msr_eventsel;
> > There's no need to track these per PMC, the tracking can be per PMU, e.g.
> >
> > 	u64 gp_eventsel_base;
> > 	u64 gp_counter_base;
> > 	u64 gp_shift;
> > 	u64 fixed_base;
> >
> > Actually, there's no need for a per-PMU fixed base, as that can be shoved into
> > kvm_pmu_ops.  LOL, and the upcoming patch hardcodes INTEL_PMC_FIXED_RDPMC_BASE.
> > Naughty, naughty ;-)
> >
> > It's not pretty, but 16 bytes per PMC isn't trivial. 
> >
> > Hmm, actually, scratch all that.  A better alternative would be to provide a
> > helper to put/load counter/selector MSRs, and call that from vendor code.  Ooh,
> > I think there's a bug here.  On AMD, the guest event selector MSRs need to be
> > loaded _before_ PERF_GLOBAL_CTRL, no?  I.e. enable the guest's counters only
> > after all selectors have been switched AMD64_EVENTSEL_GUESTONLY.  Otherwise there
> > would be a brief window where KVM could incorrectly enable counters in the host.
> > And the reverse that for put().
> >
> > But Intel has the opposite ordering, because MSR_CORE_PERF_GLOBAL_CTRL needs to
> > be cleared before changing event selectors.
> 
> Not quite sure about AMD platforms, but it seems both Intel and AMD
> platforms follow below sequence to manipulated PMU MSRs.
> 
> disable PERF_GLOBAL_CTRL MSR
> 
> manipulate counter-level PMU MSR
> 
> enable PERF_GLOBAL_CTRL MSR

Nope.  kvm_pmu_restore_pmu_context() does:

	static_call_cond(kvm_x86_pmu_restore_pmu_context)(vcpu);

	/*
	 * No need to zero out unexposed GP/fixed counters/selectors since RDPMC
	 * in this case will be intercepted. Accessing to these counters and
	 * selectors will cause #GP in the guest.
	 */
	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
		pmc = &pmu->gp_counters[i];
		wrmsrl(pmc->msr_counter, pmc->counter);
		wrmsrl(pmc->msr_eventsel, pmu->gp_counters[i].eventsel_hw);
	}

	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
		pmc = &pmu->fixed_counters[i];
		wrmsrl(pmc->msr_counter, pmc->counter);
	}

And amd_restore_pmu_context() does:

	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
	rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, global_status);

	/* Clear host global_status MSR if non-zero. */
	if (global_status)
		wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, global_status);

	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET, pmu->global_status);

	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, pmu->global_ctrl);

So the sequence on AMD is currently:

  disable PERF_GLOBAL_CTRL

  save host PERF_GLOBAL_STATUS 

  load guest PERF_GLOBAL_STATUS (clear+set)

  load guest PERF_GLOBAL_CTRL

  load guest per-counter MSRs