On 11/21/2024 4:13 AM, Sean Christopherson wrote: > On Thu, Aug 01, 2024, Mingwei Zhang wrote: >> Implement emulated counter increment for passthrough PMU under KVM_REQ_PMU. >> Defer the counter increment to KVM_REQ_PMU handler because counter >> increment requests come from kvm_pmu_trigger_event() which can be triggered >> within the KVM_RUN inner loop or outside of the inner loop. This means the >> counter increment could happen before or after PMU context switch. >> >> So process counter increment in one place makes the implementation simple. >> >> Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> >> Co-developed-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> >> Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> >> --- >> arch/x86/kvm/pmu.c | 41 +++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 39 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c >> index 5cc539bdcc7e..41057d0122bd 100644 >> --- a/arch/x86/kvm/pmu.c >> +++ b/arch/x86/kvm/pmu.c >> @@ -510,6 +510,18 @@ static int reprogram_counter(struct kvm_pmc *pmc) >> eventsel & ARCH_PERFMON_EVENTSEL_INT); >> } >> >> +static void kvm_pmu_handle_event_in_passthrough_pmu(struct kvm_vcpu *vcpu) >> +{ >> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + >> + static_call_cond(kvm_x86_pmu_set_overflow)(vcpu); >> + >> + if (atomic64_read(&pmu->__reprogram_pmi)) { >> + kvm_make_request(KVM_REQ_PMI, vcpu); >> + atomic64_set(&pmu->__reprogram_pmi, 0ull); >> + } >> +} >> + >> void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) >> { >> DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); >> @@ -517,6 +529,9 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) >> struct kvm_pmc *pmc; >> int bit; >> >> + if (is_passthrough_pmu_enabled(vcpu)) >> + return kvm_pmu_handle_event_in_passthrough_pmu(vcpu); >> + >> bitmap_copy(bitmap, pmu->reprogram_pmi, X86_PMC_IDX_MAX); >> >> /* >> @@ -848,6 +863,17 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu) >> kvm_pmu_reset(vcpu); >> } >> >> +static void kvm_passthrough_pmu_incr_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc) >> +{ >> + if (static_call(kvm_x86_pmu_incr_counter)(pmc)) { > This is absurd. It's the same ugly code in both Intel and AMD. > > static bool intel_incr_counter(struct kvm_pmc *pmc) > { > pmc->counter += 1; > pmc->counter &= pmc_bitmask(pmc); > > if (!pmc->counter) > return true; > > return false; > } > > static bool amd_incr_counter(struct kvm_pmc *pmc) > { > pmc->counter += 1; > pmc->counter &= pmc_bitmask(pmc); > > if (!pmc->counter) > return true; > > return false; > } > >> + __set_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->global_status); > Using __set_bit() is unnecessary, ugly, and dangerous. KVM uses set_bit(), no > underscores, for things like reprogram_pmi because the updates need to be atomic. > > The downside of __set_bit() and friends is that if pmc->idx is garbage, KVM will > clobber memory, whereas BIT_ULL(pmc->idx) is "just" undefined behavior. But > dropping the update is far better than clobbering memory, and can be detected by > UBSAN (though I doubt anyone is hitting this code with UBSAN). > > For this code, a regular ol' bitwise-OR will suffice. > >> + kvm_make_request(KVM_REQ_PMU, vcpu); >> + >> + if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT) >> + set_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); > This is badly in need of a comment, and the ordering is unnecessarily weird. > Set bits in reprogram_pmi *before* making the request. It doesn't matter here > since this is all on the same vCPU, but it's good practice since KVM_REQ_XXX > provides the necessary barriers to allow for safe, correct cross-CPU updates. > > That said, why on earth is the mediated PMU using KVM_REQ_PMU? Set global_status > and KVM_REQ_PMI, done. > >> + } >> +} >> + >> static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) >> { >> pmc->emulated_counter++; >> @@ -880,7 +906,8 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc) >> return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user; >> } >> >> -void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) >> +static void __kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel, >> + bool is_passthrough) >> { >> DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); >> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> @@ -914,9 +941,19 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) >> !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc)) >> continue; >> >> - kvm_pmu_incr_counter(pmc); >> + if (is_passthrough) >> + kvm_passthrough_pmu_incr_counter(vcpu, pmc); >> + else >> + kvm_pmu_incr_counter(pmc); >> } >> } >> + >> +void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) >> +{ >> + bool is_passthrough = is_passthrough_pmu_enabled(vcpu); >> + >> + __kvm_pmu_trigger_event(vcpu, eventsel, is_passthrough); > Using an inner helper for this is silly, even if the mediated information were > snapshot per-vCPU. Just grab the snapshot in a local variable. Using a param > adds no value and unnecessarily obfuscates the code. > > That's all a moot point though, because (a) KVM can check enable_mediated_pmu > directy and (b) pivoting on behavior belongs in kvm_pmu_incr_counter(), not here. > > And I am leaning towards having the mediated vs. perf-based code live in the same > function, unless one or both is "huge", so that it's easier to understand and > appreciate the differences in the implementations. > > Not an action item for y'all, but this is also a great time to add comments, which > are sorely lacking in the code. I am more than happy to do that, as it helps me > understand (and thus review) the code. I'll throw in suggestions here and there > as I review. > > Anyways, this? > > static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) > { > /* > * For perf-based PMUs, accumulate software-emulated events separately > * from pmc->counter, as pmc->counter is offset by the count of the > * associated perf event. Request reprogramming, which will consult > * both emulated and hardware-generated events to detect overflow. > */ > if (!enable_mediated_pmu) { > pmc->emulated_counter++; > kvm_pmu_request_counter_reprogram(pmc); > return; > } > > /* > * For mediated PMUs, pmc->counter is updated when the vCPU's PMU is > * put, and will be loaded into hardware when the PMU is loaded. Simply > * increment the counter and signal overflow if it wraps to zero. > */ > pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc); > if (!pmc->counter) { > pmc_to_pmu(pmc)->global_status) |= BIT_ULL(pmc->idx); > kvm_make_request(KVM_REQ_PMI, vcpu); > } > } Yes, thanks.