Hi Reiji, On Sat, 12 Nov 2022 07:55:38 +0000, Reiji Watanabe <reijiw@xxxxxxxxxx> wrote: > > Hi Marc, > > On Mon, Nov 7, 2022 at 12:54 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > Ricardo recently pointed out that the PMU chained counter emulation > > in KVM wasn't quite behaving like the one on actual hardware, in > > the sense that a chained counter would expose an overflow on > > both halves of a chained counter, while KVM would only expose the > > overflow on the top half. > > > > The difference is subtle, but significant. What does the architecture > > say (DDI0087 H.a): > > > > - Up to PMUv3p4, all counters but the cycle counter are 32bit > > > > - A 32bit counter that overflows generates a CHAIN event on the > > adjacent counter after exposing its own overflow status > > > > - The CHAIN event is accounted if the counter is correctly > > configured (CHAIN event selected and counter enabled) > > > > This all means that our current implementation (which uses 64bit > > perf events) prevents us from emulating this overflow on the lower half. > > > > How to fix this? By implementing the above, to the letter. > > > > This largly results in code deletion, removing the notions of > > "counter pair", "chained counters", and "canonical counter". > > The code is further restructured to make the CHAIN handling similar > > to SWINC, as the two are now extremely similar in behaviour. > > > > Reported-by: Ricardo Koller <ricarkol@xxxxxxxxxx> > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > --- > > arch/arm64/kvm/pmu-emul.c | 312 ++++++++++---------------------------- > > include/kvm/arm_pmu.h | 2 - > > 2 files changed, 83 insertions(+), 231 deletions(-) > > > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > > index 0003c7d37533..a38b3127f649 100644 > > --- a/arch/arm64/kvm/pmu-emul.c > > +++ b/arch/arm64/kvm/pmu-emul.c > > @@ -15,16 +15,14 @@ > > #include <kvm/arm_pmu.h> > > #include <kvm/arm_vgic.h> > > > > +#define PERF_ATTR_CFG1_COUNTER_64BIT BIT(0) > > Although this isn't the new code (but just a name change), > wouldn't it be nicer to have armv8pmu_event_is_64bit() > (in arch/arm64/kernel/perf_event.c) use the macro as well ? We tried that in the past, and the amount of churn wasn't really worth it. I'm happy to revisit this in the future, but probably as a separate patch. [...] > > @@ -163,29 +97,7 @@ static u64 kvm_pmu_get_pair_counter_value(struct kvm_vcpu *vcpu, > > counter += perf_event_read_value(pmc->perf_event, &enabled, > > &running); > > > > - return counter; > > -} > > - > > -/** > > - * kvm_pmu_get_counter_value - get PMU counter value > > - * @vcpu: The vcpu pointer > > - * @select_idx: The counter index > > - */ > > -u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) > > -{ > > - u64 counter; > > - struct kvm_pmu *pmu = &vcpu->arch.pmu; > > - struct kvm_pmc *pmc = &pmu->pmc[select_idx]; > > - > > - if (!kvm_vcpu_has_pmu(vcpu)) > > - return 0; > > - > > - counter = kvm_pmu_get_pair_counter_value(vcpu, pmc); > > - > > - if (kvm_pmu_pmc_is_chained(pmc) && > > - kvm_pmu_idx_is_high_counter(select_idx)) > > - counter = upper_32_bits(counter); > > - else if (select_idx != ARMV8_PMU_CYCLE_IDX) > > + if (select_idx != ARMV8_PMU_CYCLE_IDX) > > Nit:Using 'pmc->idx' instead of 'select_idx' appears to be more consistent. Well, this is the exact opposite of Oliver's comment last time. I initially used pmc->idx, but it made the diff somehow larger and also more difficult to understand what changed. In the end, I'd rather rework the whole file to consistently use vcpu+idx or pmc, as the mixed use of both is annoying. And that's probably a cleanup patch for later. [...] > > @@ -340,11 +245,8 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) > > > > pmc = &pmu->pmc[i]; > > > > - /* A change in the enable state may affect the chain state */ > > - kvm_pmu_update_pmc_chained(vcpu, i); > > kvm_pmu_create_perf_event(vcpu, i); > > > > - /* At this point, pmc must be the canonical */ > > if (pmc->perf_event) { > > perf_event_enable(pmc->perf_event); > > if (pmc->perf_event->state != PERF_EVENT_STATE_ACTIVE) > > @@ -375,11 +277,8 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val) > > > > pmc = &pmu->pmc[i]; > > > > - /* A change in the enable state may affect the chain state */ > > - kvm_pmu_update_pmc_chained(vcpu, i); > > kvm_pmu_create_perf_event(vcpu, i); > > Do we still need to call kvm_pmu_update_pmc_chained() here even > with this patch ? (I would think the reason why the function was > called here was because the chain state change could affect the > backed perf event attribute before). > I have the same comment for kvm_pmu_enable_counter_mask(). Do you mean kvm_pmu_create_perf_event() instead? I think we can drop the one on disable. But the one on enable is required, as we need to be able to start counting an event even if the guest hasn't programmed the event number (unlikely, but allowed by the architecture). It can be made conditional though. I have the following fix queued: diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 26293f842b0f..b7a5f75d008d 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -277,9 +277,9 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) pmc = &pmu->pmc[i]; - kvm_pmu_create_perf_event(vcpu, i); - - if (pmc->perf_event) { + if (!pmc->perf_event) { + kvm_pmu_create_perf_event(vcpu, i); + } else { perf_event_enable(pmc->perf_event); if (pmc->perf_event->state != PERF_EVENT_STATE_ACTIVE) kvm_debug("fail to enable perf event\n"); @@ -309,8 +309,6 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val) pmc = &pmu->pmc[i]; - kvm_pmu_create_perf_event(vcpu, i); - if (pmc->perf_event) perf_event_disable(pmc->perf_event); } [...] > > @@ -670,30 +569,21 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) > > attr.exclude_host = 1; /* Don't count host events */ > > attr.config = eventsel; > > > > - counter = kvm_pmu_get_pair_counter_value(vcpu, pmc); > > + counter = kvm_pmu_get_counter_value(vcpu, select_idx); > > Nit: Since all existing codes in the function use pmc->idx, > I would think it would be better to use 'pmc->idx' instead of > 'select_idx' consistently. See my above comment about the need for a more global cleanup of this file, and the ask to keep the patch focused on the actual rework. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm