On Fri, Oct 11, 2019 at 12:28:48PM +0100, Marc Zyngier wrote: > On Tue, 8 Oct 2019 23:42:22 +0100 > Andrew Murray <andrew.murray@xxxxxxx> wrote: > > > On Tue, Oct 08, 2019 at 05:01:28PM +0100, Marc Zyngier wrote: > > > The PMU emulation code uses the perf event sample period to trigger > > > the overflow detection. This works fine for the *first* overflow > > > handling, but results in a huge number of interrupts on the host, > > > unrelated to the number of interrupts handled in the guest (a x20 > > > factor is pretty common for the cycle counter). On a slow system > > > (such as a SW model), this can result in the guest only making > > > forward progress at a glacial pace. > > > > > > It turns out that the clue is in the name. The sample period is > > > exactly that: a period. And once the an overflow has occured, > > > the following period should be the full width of the associated > > > counter, instead of whatever the guest had initially programed. > > > > > > Reset the sample period to the architected value in the overflow > > > handler, which now results in a number of host interrupts that is > > > much closer to the number of interrupts in the guest. > > > > > > Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing") > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > > --- > > > virt/kvm/arm/pmu.c | 15 +++++++++++++++ > > > 1 file changed, 15 insertions(+) > > > > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > > > index 25a483a04beb..8b524d74c68a 100644 > > > --- a/virt/kvm/arm/pmu.c > > > +++ b/virt/kvm/arm/pmu.c > > > @@ -442,6 +442,20 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event, > > > struct kvm_pmc *pmc = perf_event->overflow_handler_context; > > > struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc); > > > int idx = pmc->idx; > > > + u64 period; > > > + > > > + /* > > > + * Reset the sample period to the architectural limit, > > > + * i.e. the point where the counter overflows. > > > + */ > > > + period = -(local64_read(&pmc->perf_event->count)); > > > + > > > + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx)) > > > + period &= GENMASK(31, 0); > > > + > > > + local64_set(&pmc->perf_event->hw.period_left, 0); > > > + pmc->perf_event->attr.sample_period = period; > > > + pmc->perf_event->hw.sample_period = period; > > > > I believe that above, you are reducing the period by the amount period_left > > would have been - they cancel each other out. > > That's not what I see happening, having put some traces: > > kvm_pmu_perf_overflow: count = 308 left = 129 > kvm_pmu_perf_overflow: count = 409 left = 47 > kvm_pmu_perf_overflow: count = 585 left = 223 > kvm_pmu_perf_overflow: count = 775 left = 413 > kvm_pmu_perf_overflow: count = 1368 left = 986 > kvm_pmu_perf_overflow: count = 2086 left = 1716 > kvm_pmu_perf_overflow: count = 958 left = 584 > kvm_pmu_perf_overflow: count = 1907 left = 1551 > kvm_pmu_perf_overflow: count = 7292 left = 6932 Indeed. > > although I've now moved the stop/start calls inside the overflow > handler so that I don't have to mess with the PMU backend. > > > Given that kvm_pmu_perf_overflow is now always called between a > > cpu_pmu->pmu.stop and a cpu_pmu->pmu.start, it means armpmu_event_update > > has been called prior to this function, and armpmu_event_set_period will > > be called after... > > > > Therefore, I think the above could be reduced to: > > > > + /* > > + * Reset the sample period to the architectural limit, > > + * i.e. the point where the counter overflows. > > + */ > > + u64 period = GENMASK(63, 0); > > + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx)) > > + period = GENMASK(31, 0); > > + > > + pmc->perf_event->attr.sample_period = period; > > + pmc->perf_event->hw.sample_period = period; > > > > This is because armpmu_event_set_period takes into account the overflow > > and the counter wrapping via the "if (unlikely(left <= 0)) {" block. > > I think that's an oversimplification. As shown above, the counter has > moved forward, and there is a delta to be accounted for. > Yeah, I probably need to spend more time understanding this... > > Though this code confuses me easily, so I may be talking rubbish. > > Same here! ;-) > > > > > > > > > __vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx); > > > > > > @@ -557,6 +571,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) > > > attr.exclude_host = 1; /* Don't count host events */ > > > attr.config = (pmc->idx == ARMV8_PMU_CYCLE_IDX) ? > > > ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel; > > > + attr.config1 = PERF_ATTR_CFG1_RELOAD_EVENT; > > > > I'm not sure that this flag, or patch 4 is really needed. As the perf > > events created by KVM are pinned to the task and exclude_(host,hv) are set - > > I think the perf event is not active at this point. Therefore if you change > > the sample period, you can wait until the perf event gets scheduled back in > > (when you return to the guest) where it's call to pmu.start will result in > > armpmu_event_set_period being called. In other words the pmu.start and > > pmu.stop you add in patch 4 is effectively being done for you by perf when > > the KVM task is switched out. > > > > I'd be interested to see if the following works: > > > > + WARN_ON(pmc->perf_event->state == PERF_EVENT_STATE_ACTIVE) > > + > > + /* > > + * Reset the sample period to the architectural limit, > > + * i.e. the point where the counter overflows. > > + */ > > + u64 period = GENMASK(63, 0); > > + if (!kvm_pmu_idx_is_64bit(vcpu, pmc->idx)) > > + period = GENMASK(31, 0); > > + > > + pmc->perf_event->attr.sample_period = period; > > + pmc->perf_event->hw.sample_period = period; > > > > > > > > counter = kvm_pmu_get_pair_counter_value(vcpu, pmc); > > > > > The warning fires, which is expected: for event to be inactive, you > need to have the vcpu being scheduled out. When the PMU interrupt > fires, it is bound to preempt the vcpu itself, and the event is of > course still active. That makes sense. That also provides a justification for stopping and starting the PMU. > > > What about ARM 32 bit support for this? > > What about it? 32bit KVM/arm doesn't support the PMU at all. Thanks for the clarification. Andrew Murray > A 32bit > guest on a 64bit host could use the PMU just fine (it is just that > 32bit Linux doesn't have a PMUv3 driver -- I had patches for that, but > they never made it upstream). > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny...