Re: [RFC PATCH v3 44/58] KVM: x86/pmu: Implement emulated counter increment for passthrough PMU

"Mi, Dapeng" <dapeng1.mi@xxxxxxxxxxxxxxx> · Thu, 21 Nov 2024 10:27:38 +0800

On 11/21/2024 4:13 AM, Sean Christopherson wrote:
> On Thu, Aug 01, 2024, Mingwei Zhang wrote:
>> Implement emulated counter increment for passthrough PMU under KVM_REQ_PMU.
>> Defer the counter increment to KVM_REQ_PMU handler because counter
>> increment requests come from kvm_pmu_trigger_event() which can be triggered
>> within the KVM_RUN inner loop or outside of the inner loop. This means the
>> counter increment could happen before or after PMU context switch.
>>
>> So process counter increment in one place makes the implementation simple.
>>
>> Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
>> Co-developed-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx>
>> ---
>>  arch/x86/kvm/pmu.c | 41 +++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 39 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index 5cc539bdcc7e..41057d0122bd 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -510,6 +510,18 @@ static int reprogram_counter(struct kvm_pmc *pmc)
>>  				     eventsel & ARCH_PERFMON_EVENTSEL_INT);
>>  }
>>  
>> +static void kvm_pmu_handle_event_in_passthrough_pmu(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> +
>> +	static_call_cond(kvm_x86_pmu_set_overflow)(vcpu);
>> +
>> +	if (atomic64_read(&pmu->__reprogram_pmi)) {
>> +		kvm_make_request(KVM_REQ_PMI, vcpu);
>> +		atomic64_set(&pmu->__reprogram_pmi, 0ull);
>> +	}
>> +}
>> +
>>  void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
>>  {
>>  	DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX);
>> @@ -517,6 +529,9 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
>>  	struct kvm_pmc *pmc;
>>  	int bit;
>>  
>> +	if (is_passthrough_pmu_enabled(vcpu))
>> +		return kvm_pmu_handle_event_in_passthrough_pmu(vcpu);
>> +
>>  	bitmap_copy(bitmap, pmu->reprogram_pmi, X86_PMC_IDX_MAX);
>>  
>>  	/*
>> @@ -848,6 +863,17 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
>>  	kvm_pmu_reset(vcpu);
>>  }
>>  
>> +static void kvm_passthrough_pmu_incr_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc)
>> +{
>> +	if (static_call(kvm_x86_pmu_incr_counter)(pmc)) {
> This is absurd.  It's the same ugly code in both Intel and AMD.
>
> static bool intel_incr_counter(struct kvm_pmc *pmc)
> {
> 	pmc->counter += 1;
> 	pmc->counter &= pmc_bitmask(pmc);
>
> 	if (!pmc->counter)
> 		return true;
>
> 	return false;
> }
>
> static bool amd_incr_counter(struct kvm_pmc *pmc)
> {
> 	pmc->counter += 1;
> 	pmc->counter &= pmc_bitmask(pmc);
>
> 	if (!pmc->counter)
> 		return true;
>
> 	return false;
> }
>
>> +		__set_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->global_status);
> Using __set_bit() is unnecessary, ugly, and dangerous.  KVM uses set_bit(), no
> underscores, for things like reprogram_pmi because the updates need to be atomic.
>
> The downside of __set_bit() and friends is that if pmc->idx is garbage, KVM will
> clobber memory, whereas BIT_ULL(pmc->idx) is "just" undefined behavior.  But
> dropping the update is far better than clobbering memory, and can be detected by
> UBSAN (though I doubt anyone is hitting this code with UBSAN).
>
> For this code, a regular ol' bitwise-OR will suffice.  
>
>> +		kvm_make_request(KVM_REQ_PMU, vcpu);
>> +
>> +		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT)
>> +			set_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
> This is badly in need of a comment, and the ordering is unnecessarily weird.
> Set bits in reprogram_pmi *before* making the request.  It doesn't matter here
> since this is all on the same vCPU, but it's good practice since KVM_REQ_XXX
> provides the necessary barriers to allow for safe, correct cross-CPU updates.
>
> That said, why on earth is the mediated PMU using KVM_REQ_PMU?  Set global_status
> and KVM_REQ_PMI, done.
>
>> +	}
>> +}
>> +
>>  static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
>>  {
>>  	pmc->emulated_counter++;
>> @@ -880,7 +906,8 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc)
>>  	return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
>>  }
>>  
>> -void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel)
>> +static void __kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel,
>> +				    bool is_passthrough)
>>  {
>>  	DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX);
>>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> @@ -914,9 +941,19 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel)
>>  		    !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc))
>>  			continue;
>>  
>> -		kvm_pmu_incr_counter(pmc);
>> +		if (is_passthrough)
>> +			kvm_passthrough_pmu_incr_counter(vcpu, pmc);
>> +		else
>> +			kvm_pmu_incr_counter(pmc);
>>  	}
>>  }
>> +
>> +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel)
>> +{
>> +	bool is_passthrough = is_passthrough_pmu_enabled(vcpu);
>> +
>> +	__kvm_pmu_trigger_event(vcpu, eventsel, is_passthrough);
> Using an inner helper for this is silly, even if the mediated information were
> snapshot per-vCPU.  Just grab the snapshot in a local variable.  Using a param
> adds no value and unnecessarily obfuscates the code.
>
> That's all a moot point though, because (a) KVM can check enable_mediated_pmu
> directy and (b) pivoting on behavior belongs in kvm_pmu_incr_counter(), not here.
>
> And I am leaning towards having the mediated vs. perf-based code live in the same
> function, unless one or both is "huge", so that it's easier to understand and
> appreciate the differences in the implementations.
>
> Not an action item for y'all, but this is also a great time to add comments, which
> are sorely lacking in the code.  I am more than happy to do that, as it helps me
> understand (and thus review) the code.  I'll throw in suggestions here and there
> as I review.
>
> Anyways, this?
>
> static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
> {
> 	/*
> 	 * For perf-based PMUs, accumulate software-emulated events separately
> 	 * from pmc->counter, as pmc->counter is offset by the count of the
> 	 * associated perf event.  Request reprogramming, which will consult
> 	 * both emulated and hardware-generated events to detect overflow.
> 	 */
> 	if (!enable_mediated_pmu) {
> 		pmc->emulated_counter++;
> 		kvm_pmu_request_counter_reprogram(pmc);
> 		return;
> 	}
>
> 	/*
> 	 * For mediated PMUs, pmc->counter is updated when the vCPU's PMU is
> 	 * put, and will be loaded into hardware when the PMU is loaded.  Simply
> 	 * increment the counter and signal overflow if it wraps to zero.
> 	 */
> 	pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
> 	if (!pmc->counter) {
> 		pmc_to_pmu(pmc)->global_status) |= BIT_ULL(pmc->idx);
> 		kvm_make_request(KVM_REQ_PMI, vcpu);
> 	}
> }

Yes, thanks.