On Mon, Oct 22, 2018 at 06:18:26PM +0800, Nick Hu wrote: > On Thu, Oct 18, 2018 at 03:23:59PM +0100, Mark Rutland wrote: > > On Thu, Oct 18, 2018 at 04:43:13PM +0800, Nickhu wrote: > > > +static irqreturn_t nds32_pmu_handle_irq(int irq_num, void *dev) > > > +{ > > > + u32 pfm; > > > + struct perf_sample_data data; > > > + struct nds32_pmu *cpu_pmu = (struct nds32_pmu *)dev; > > > + struct pmu_hw_events *cpuc = cpu_pmu->get_hw_events(); > > > + struct pt_regs *regs; > > > + int idx; > > > + /* > > > + * Get and reset the IRQ flags > > > + */ > > > + pfm = nds32_pfm_getreset_flags(); > > > + > > > + /* > > > + * Did an overflow occur? > > > + */ > > > + if (!nds32_pfm_has_overflowed(pfm)) > > > + return IRQ_NONE; > > > + > > > + /* > > > + * Handle the counter(s) overflow(s) > > > + */ > > > + regs = get_irq_regs(); > > > + > > > + for (idx = 0; idx < cpu_pmu->num_events; ++idx) { > > > + struct perf_event *event = cpuc->events[idx]; > > > + struct hw_perf_event *hwc; > > > + > > > + /* Ignore if we don't have an event. */ > > > + if (!event) > > > + continue; > > > + > > > + /* > > > + * We have a single interrupt for all counters. Check that > > > + * each counter has overflowed before we process it. > > > + */ > > > + if (!nds32_pfm_counter_has_overflowed(pfm, idx)) > > > + continue; > > > + > > > + hwc = &event->hw; > > > + nds32_pmu_event_update(event); > > > + perf_sample_data_init(&data, 0, hwc->last_period); > > > + if (!nds32_pmu_event_set_period(event)) > > > + continue; > > > + > > > + if (perf_event_overflow(event, &data, regs)) > > > + cpu_pmu->disable(event); > > > + } > > > + > > > + /* > > > + * Handle the pending perf events. > > > + * > > > + * Note: this call *must* be run with interrupts disabled. For > > > + * platforms that can have the PMU interrupts raised as an NMI, this > > > + * will not work. > > > + */ > > > + irq_work_run(); > > > + > > > + return IRQ_HANDLED; > > > +} > > > > You'll want to stop all counters over the IRQ handler to ensure that > > groups are consistent. > > > > [...] > > Our implementation of IRQ handler references the irq_handler of > arch/arm/kernel/perf_event_v7.c. Ah, it looks like we forgot to fix that up when we fixed arm64 in commit: 3cce50dfec4a5b04 ("arm64: perf: Disable PMU while processing counter overflows") ... we should do that for all the arch/arm PMU IRQ handlers. > Do you mean that when one counter overflowed, other counters do not > count the events caused by the counter overflow handler? The problem is that counters are still counting while they're being read/modified, so there is skew. Groups of counters should be scheduled atomically (i.e. at any point in time, all counters are counting, or all are not counting). This isn't true if they're read one-by-one in the IRQ handler (and maybe reprogrammed afterwards). Consider a group of three events, programmed with the same event (e.g. bus-cycles). You would expect that all three events read the same value, since they're identical. In the IRQ handler we do something like: count0 = read_counter(0); // reads N < X bus cycles occur > count1 = read_counter(1); // reads N + X < Y bus cycles occur > count2 = read_counter(2); // reads N + X + Y We read the first counter, and before we read the next counter, more events occur. Likewise for the next pair of counters. This leads to surprising results, especially if counters are reprogrammed on overflow. That means that you can't necessarily derive a meaningful ratio between counter values. The simplest way to solve this is to stop the PMU before reading the counters, and start it again afterwards. It leaves a small window where the counters aren't counting, but it ensures the ratio between counters is meaningful. See commit: 3cce50dfec4a5b04 ("arm64: perf: Disable PMU while processing counter overflows") ... for how we did that on arm64. Thanks, Mark.