Re: [RFC PATCH v3 31/58] KVM: x86/pmu: Add counter MSR and selector MSR index into struct kvm_pmc

"Mi, Dapeng" <dapeng1.mi@xxxxxxxxxxxxxxx> · Thu, 21 Nov 2024 08:56:00 +0800

On 11/21/2024 1:30 AM, Sean Christopherson wrote:
> On Wed, Nov 20, 2024, Dapeng Mi wrote:
>> On 11/20/2024 2:58 AM, Sean Christopherson wrote:
>>> Please squash this with the patch that does the actual save/load.  Hmm, maybe it
>>> should be put/load, now that I think about it more?  That's more consitent with
>>> existing KVM terminology.
>> Sure. I ever noticed that this in-consistence, but "put" seem not so
>> intuitionistic as "save", so didn't change it.
> Yeah, "put" isn't perfect, but neither is "save", because the save/put path also
> purges hardware state.
>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 4b3ce6194bdb..603727312f9c 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -522,6 +522,8 @@ struct kvm_pmc {
>>>>  	 */
>>>>  	u64 emulated_counter;
>>>>  	u64 eventsel;
>>>> +	u64 msr_counter;
>>>> +	u64 msr_eventsel;
>>> There's no need to track these per PMC, the tracking can be per PMU, e.g.
>>>
>>> 	u64 gp_eventsel_base;
>>> 	u64 gp_counter_base;
>>> 	u64 gp_shift;
>>> 	u64 fixed_base;
>>>
>>> Actually, there's no need for a per-PMU fixed base, as that can be shoved into
>>> kvm_pmu_ops.  LOL, and the upcoming patch hardcodes INTEL_PMC_FIXED_RDPMC_BASE.
>>> Naughty, naughty ;-)
>>>
>>> It's not pretty, but 16 bytes per PMC isn't trivial. 
>>>
>>> Hmm, actually, scratch all that.  A better alternative would be to provide a
>>> helper to put/load counter/selector MSRs, and call that from vendor code.  Ooh,
>>> I think there's a bug here.  On AMD, the guest event selector MSRs need to be
>>> loaded _before_ PERF_GLOBAL_CTRL, no?  I.e. enable the guest's counters only
>>> after all selectors have been switched AMD64_EVENTSEL_GUESTONLY.  Otherwise there
>>> would be a brief window where KVM could incorrectly enable counters in the host.
>>> And the reverse that for put().
>>>
>>> But Intel has the opposite ordering, because MSR_CORE_PERF_GLOBAL_CTRL needs to
>>> be cleared before changing event selectors.
>> Not quite sure about AMD platforms, but it seems both Intel and AMD
>> platforms follow below sequence to manipulated PMU MSRs.
>>
>> disable PERF_GLOBAL_CTRL MSR
>>
>> manipulate counter-level PMU MSR
>>
>> enable PERF_GLOBAL_CTRL MSR
> Nope.  kvm_pmu_restore_pmu_context() does:
>
> 	static_call_cond(kvm_x86_pmu_restore_pmu_context)(vcpu);
>
>
> 	/*
> 	 * No need to zero out unexposed GP/fixed counters/selectors since RDPMC
> 	 * in this case will be intercepted. Accessing to these counters and
> 	 * selectors will cause #GP in the guest.
> 	 */
> 	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
> 		pmc = &pmu->gp_counters[i];
> 		wrmsrl(pmc->msr_counter, pmc->counter);
> 		wrmsrl(pmc->msr_eventsel, pmu->gp_counters[i].eventsel_hw);
> 	}
>
> 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> 		pmc = &pmu->fixed_counters[i];
> 		wrmsrl(pmc->msr_counter, pmc->counter);
> 	}
>
> And amd_restore_pmu_context() does:
>
> 	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
> 	rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, global_status);
>
> 	/* Clear host global_status MSR if non-zero. */
> 	if (global_status)
> 		wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, global_status);
>
> 	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET, pmu->global_status);
>
> 	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, pmu->global_ctrl);
>
> So the sequence on AMD is currently:
>
>   disable PERF_GLOBAL_CTRL
>
>   save host PERF_GLOBAL_STATUS 
>
>   load guest PERF_GLOBAL_STATUS (clear+set)
>
>   load guest PERF_GLOBAL_CTRL
>
>   load guest per-counter MSRs

Checked again, yes, indeed. So the better way to handle this is to define a
common helper to manipulate counters MSR in the common code, but call this
common helper in the vendor specific callback.