On Thu, Aug 01, 2024, Mingwei Zhang wrote: > Implement emulated counter increment for passthrough PMU under KVM_REQ_PMU. > Defer the counter increment to KVM_REQ_PMU handler because counter > increment requests come from kvm_pmu_trigger_event() which can be triggered > within the KVM_RUN inner loop or outside of the inner loop. This means the > counter increment could happen before or after PMU context switch. > > So process counter increment in one place makes the implementation simple. > > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > Co-developed-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> > Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> > --- > arch/x86/kvm/pmu.c | 41 +++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 39 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index 5cc539bdcc7e..41057d0122bd 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -510,6 +510,18 @@ static int reprogram_counter(struct kvm_pmc *pmc) > eventsel & ARCH_PERFMON_EVENTSEL_INT); > } > > +static void kvm_pmu_handle_event_in_passthrough_pmu(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + > + static_call_cond(kvm_x86_pmu_set_overflow)(vcpu); > + > + if (atomic64_read(&pmu->__reprogram_pmi)) { > + kvm_make_request(KVM_REQ_PMI, vcpu); > + atomic64_set(&pmu->__reprogram_pmi, 0ull); > + } > +} > + > void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) > { > DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); > @@ -517,6 +529,9 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) > struct kvm_pmc *pmc; > int bit; > > + if (is_passthrough_pmu_enabled(vcpu)) > + return kvm_pmu_handle_event_in_passthrough_pmu(vcpu); > + > bitmap_copy(bitmap, pmu->reprogram_pmi, X86_PMC_IDX_MAX); > > /* > @@ -848,6 +863,17 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu) > kvm_pmu_reset(vcpu); > } > > +static void kvm_passthrough_pmu_incr_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc) > +{ > + if (static_call(kvm_x86_pmu_incr_counter)(pmc)) { This is absurd. It's the same ugly code in both Intel and AMD. static bool intel_incr_counter(struct kvm_pmc *pmc) { pmc->counter += 1; pmc->counter &= pmc_bitmask(pmc); if (!pmc->counter) return true; return false; } static bool amd_incr_counter(struct kvm_pmc *pmc) { pmc->counter += 1; pmc->counter &= pmc_bitmask(pmc); if (!pmc->counter) return true; return false; } > + __set_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->global_status); Using __set_bit() is unnecessary, ugly, and dangerous. KVM uses set_bit(), no underscores, for things like reprogram_pmi because the updates need to be atomic. The downside of __set_bit() and friends is that if pmc->idx is garbage, KVM will clobber memory, whereas BIT_ULL(pmc->idx) is "just" undefined behavior. But dropping the update is far better than clobbering memory, and can be detected by UBSAN (though I doubt anyone is hitting this code with UBSAN). For this code, a regular ol' bitwise-OR will suffice. > + kvm_make_request(KVM_REQ_PMU, vcpu); > + > + if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT) > + set_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); This is badly in need of a comment, and the ordering is unnecessarily weird. Set bits in reprogram_pmi *before* making the request. It doesn't matter here since this is all on the same vCPU, but it's good practice since KVM_REQ_XXX provides the necessary barriers to allow for safe, correct cross-CPU updates. That said, why on earth is the mediated PMU using KVM_REQ_PMU? Set global_status and KVM_REQ_PMI, done. > + } > +} > + > static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) > { > pmc->emulated_counter++; > @@ -880,7 +906,8 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc) > return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user; > } > > -void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) > +static void __kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel, > + bool is_passthrough) > { > DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); > struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > @@ -914,9 +941,19 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) > !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc)) > continue; > > - kvm_pmu_incr_counter(pmc); > + if (is_passthrough) > + kvm_passthrough_pmu_incr_counter(vcpu, pmc); > + else > + kvm_pmu_incr_counter(pmc); > } > } > + > +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) > +{ > + bool is_passthrough = is_passthrough_pmu_enabled(vcpu); > + > + __kvm_pmu_trigger_event(vcpu, eventsel, is_passthrough); Using an inner helper for this is silly, even if the mediated information were snapshot per-vCPU. Just grab the snapshot in a local variable. Using a param adds no value and unnecessarily obfuscates the code. That's all a moot point though, because (a) KVM can check enable_mediated_pmu directy and (b) pivoting on behavior belongs in kvm_pmu_incr_counter(), not here. And I am leaning towards having the mediated vs. perf-based code live in the same function, unless one or both is "huge", so that it's easier to understand and appreciate the differences in the implementations. Not an action item for y'all, but this is also a great time to add comments, which are sorely lacking in the code. I am more than happy to do that, as it helps me understand (and thus review) the code. I'll throw in suggestions here and there as I review. Anyways, this? static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) { /* * For perf-based PMUs, accumulate software-emulated events separately * from pmc->counter, as pmc->counter is offset by the count of the * associated perf event. Request reprogramming, which will consult * both emulated and hardware-generated events to detect overflow. */ if (!enable_mediated_pmu) { pmc->emulated_counter++; kvm_pmu_request_counter_reprogram(pmc); return; } /* * For mediated PMUs, pmc->counter is updated when the vCPU's PMU is * put, and will be loaded into hardware when the PMU is loaded. Simply * increment the counter and signal overflow if it wraps to zero. */ pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc); if (!pmc->counter) { pmc_to_pmu(pmc)->global_status) |= BIT_ULL(pmc->idx); kvm_make_request(KVM_REQ_PMI, vcpu); } }