Re: OMAP: sporadic hard lock-up when GPIO edge-triggered interrupts occur too fast

Hugo Vincent <hugo.vincent@xxxxxxxxx> · Mon, 8 Jun 2009 09:22:29 +1200

Hi OMAP people,

I've done some more debugging on this problem, and determined that
while it occurs less often, it still happens when PREMPT_RT is
completely disabled (using l-o 2.6.29-omap1 with the default set of
extra patches applied by the OE linux-omap_2.6.29 recipe, without
having applied the -rt patchset). I am therefore reposting this to the
l-o list as it seems like it could be a serious and general problem,
and I'm really out of ideas for fixing it...

Are there any known cases (however rare), whereby a race condition or
deadlock can occur with the current GPIO interrupt code? I've seen
many comments on the mailing list implying the code is quite fragile
or sensitive as it is, and wonder if this is because of hardware bugs
or limitations? The way the hardware works (multiple GPIO interrupts
combined into one bank interrupt to the MPU, where the bank can
contain both level and edge triggers) seems pretty stupid, especially
for the use-case where both edge and level triggers are used on one
bank. Does anyone by any chance have any patches that for example, fix
this problem even at the expense of performance or latency?

Many thanks,
Hugo

On Thu, Jun 4, 2009 at 11:51 AM, Hugo Vincent<hugo.vincent@xxxxxxxxx> wrote:
> Hi everyone,
>
> I'm trying to debug a problem with GPIO interrupts on my OMAP3503
> (Gumstix Overo) platform with kernel 2.6.29.4-rt16-omap1.
>
> While this is a sporadic lock-up, I haven't been able to reproduce it
> when PREEMPT_HARDIRQ is disabled.
>
> I think it might have something to do with this patch (which is applied BTW):
>   http://patchwork.kernel.org/patch/16046/
> And see this thread:
>   http://markmail.org/message/aaqpk5jztrrypsxz
>
> Sometimes, I see a spurious IRQ message like this:
>   Spurious irq 95: 0xffffffdf, please flush posted write for irq 31
> that the above patch is supposed to fix, just before the system locks
> up. Is it possible that the interrupt handler is getting preempted
> between acknowledging the interrupt and the 'flush posted write', and
> meanwhile another interrupt from the same bank occurs?
>
> To check this hypothesis, I built the kernel with
>   CONFIG_PREEMPT_DESKTOP=y
>   # CONFIG_PREEMPT_RT is not set
>   CONFIG_PREEMPT=y
>   CONFIG_PREEMPT_SOFTIRQS=y
>   CONFIG_PREEMPT_HARDIRQS=y
>
> which gives a /proc/irq/<irq>/<handler>/threaded flag. When I tried
> disabling threading on irqs 246-249 (which are the virtual irqs for
> the problematic GPIO interrupts), the problem still occured. I'm not
> sure how to disable threading or preemption on the intermediate ISR
> (gpio_irq_handler() at arch/arm/plat-omap/gpio.c:959) which determines
> which GPIO in the bank caused the interrupt and spawns the virtual
> interrupts.
>
> Any ideas for debugging or narrowing down on the cause?
>
> Many thanks,
> Hugo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html