RE: OMAP3430 spurious interrupts

"Woodruff, Richard" <r-woodruff2@xxxxxx> · Tue, 15 Jan 2008 10:39:25 -0600

Hi,

> From: Tony Lindgren [mailto:tony@xxxxxxxxxxx] 

> Richard, can you also describe the purpose of the spurious interrupt?
> Is it just an error on accessing the registers too soon 
> before interrupt priority sorting is done, or something like that?

Hugh? I don't think there is a 'spurious' interrupt vector as such at
the hardware.  I didn't fully resync after the last interrupt reorg so
perhaps they snuck in some new software term.

So, to me all that is saying is you get an interrupt at your L1 PIC, you
go to the device and nothing is there to clear (or you get an interrupt
and nothing is showing active at your L1 PIC).

Generally this is indicative of a bad device or incorrect device irq
programming.  If you get to many of these things your system will shut
down with a continuous flow of interrupts.  Hence the kernel thinks they
are serious and might shut down your vector if it starts taking what it
feels are too many.  Bypassing the safety check can cause you to miss
problems.

-a- What I was referring to with 'posting', is as the memory attribute
is marked now, if you clear your isr at the device near the return from
the isr, the cpu might unmask the irq before the actual write or the
effect of the write has occurred at the device.  This will result in a
IRQ request at unmask time, but when sources are checked there will be
none.

* Previously to synchronize better we had to put the barriers at the
writes to the PIC as it by default had a buffered type.  However, other
devices were strongly ordered, thus they were more safe.  If you recall
Catalin also asked for patches to controllers to fix this on ARM11.
Last time this spurious flared up it was because in open source
resyncing these barriers were dropped.  The barriers just make sure the
data has left the ARM.  However, it doesn't account for the rest of the
path (device maps).  This is where having the correct attribute for
devices is good.  Those devices then have to acknowledge back to the bus
per their protocol.  We had internal mails on this and it turns out the
ARM to OCP bridges protocol conversion bits complicate things so its not
so intuitive.  Strongly ordered is the closest to what you might guess
should happen.

In the above the interrupt request will happen for a small amount of
time until line is finally cleared.  Ignoring these types of bursts may
be harmless, but it depends a bit of some irq handlers is called and it
will need to not do anything bad to device state.  It surly wastes some
cycles.

-b- The other thing which is clear in the TRM is the bit about a false
interrupt at priority sorting time. If you monkey with masks during
sorting time you might get a false isr.  As all of the source are level
assertive at the pic, a 2nd gratuitous ACK of the vector number would be
a hack way of handling that case.  I'm not sure it happens that much in
practice.  The recommended programming model is a MASK of all ISRs down,
handle the source, then ACK, and unmask.  The Linux code path doesn't do
this however, it only masks the 1 irq in play, acks the irq, then unmaks
all the rest, then handles the device.  In the messing with the mask
around the ack with out dropping the source this small sorting window
opens up (assuming there are more irs coming in).

So far interactions with the ISP camera driver seem to have caused the
most spurious interrupts to occur.  However, you do see it with other
drivers.  As that code has matured in our trees issues have been
dropping off.

Regards,
Richard W.
-
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html