Re: [kvm-unit-tests PATCH 1/3] arm: pmu: Add missing isb()'s after sys register writing

Alexandru Elisei <alexandru.elisei@xxxxxxx> · Tue, 19 Jul 2022 12:26:54 +0100

Hi,

On Mon, Jul 18, 2022 at 10:48:29AM -0700, Ricardo Koller wrote:
> On Mon, Jul 18, 2022 at 05:38:23PM +0100, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Mon, Jul 18, 2022 at 08:49:08AM -0700, Ricardo Koller wrote:
> > > There are various pmu tests that require an isb() between enabling
> > > counting and the actual counting. This can lead to count registers
> > > reporting less events than expected; the actual enabling happens after
> > > some events have happened.  For example, some missing isb()'s in the
> > > pmu-sw-incr test lead to the following errors on bare-metal:
> > > 
> > > 	INFO: pmu: pmu-sw-incr: SW_INCR counter #0 has value 4294967280
> > >         PASS: pmu: pmu-sw-incr: PWSYNC does not increment if PMCR.E is unset
> > >         FAIL: pmu: pmu-sw-incr: counter #1 after + 100 SW_INCR
> > >         FAIL: pmu: pmu-sw-incr: counter #0 after + 100 SW_INCR
> > >         INFO: pmu: pmu-sw-incr: counter values after 100 SW_INCR #0=82 #1=98
> > >         PASS: pmu: pmu-sw-incr: overflow on counter #0 after 100 SW_INCR
> > >         SUMMARY: 4 tests, 2 unexpected failures
> > > 
> > > Add the missing isb()'s on all failing tests, plus some others that are
> > > not currently required but might in the future (like an isb() after
> > > clearing the overflow signal in the IRQ handler).
> > 
> > That's rather cryptic. What might require those hypothetical ISBs and why? Why
> > should a test add code for some hypothetical requirement that might, or might
> > not, be implemented?
> 
> Good point, this wasn't very clear. Will add something more specific.
> 
> > 
> > This is pure speculation on my part, were you seeing spurious interrupts that
> > went away after adding the ISB in irq_handler()?
> 
> I didn't see any. But I think it could happen: multiple spurious
> interrupts until the line finally gets cleared.

I agree with you, it takes a finite time for any interrupt controller (in
our case, the GIC) to deassert the interrupt line to the CPU after a device
has deasserted the interrupt line to the interrupt controller. That's why
device drivers are usually robust in dealing with spurious interrupts.

It looks to me that the way irq_handler() treats spurious interrupts might
lead to tests being incorrectly treated as failed, which is going to be a
pain to reproduce and diagnose.

@Eric, was there a particular reason for this approach?

Thanks,
Alex