Re: [RFC PATCH] OMAPDSS: DISPC: Fix IRQ unregister race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Sun, 2012-09-02 at 22:12 +0300, Dimitar Dimitrov wrote:
> Very rare kernel crashes are reported on a custom OMAP4 board. Kernel
> panics due to corrupted completion structure while executing
> dispc_irq_wait_handler(). Excerpt from kernel log:
> 
>   Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
>   Unable to handle kernel paging request at virtual address 00400130
>   ...
>   PC is at 0xebf205bc
>   LR is at __wake_up_common+0x54/0x94
>   ...
>   (__wake_up_common+0x0/0x94)
>   (complete+0x0/0x60)
>   (dispc_irq_wait_handler.36902+0x0/0x14)
>   (omap_dispc_irq_handler+0x0/0x354)
>   (handle_irq_event_percpu+0x0/0x188)
>   (handle_irq_event+0x0/0x64)
>   (handle_fasteoi_irq+0x0/0x10c)
>   (generic_handle_irq+0x0/0x48)
>   (asm_do_IRQ+0x0/0xc0)
> 
> DISPC IRQ executes callbacks with dispc.irq_lock released. Hence
> unregister_isr() and DISPC IRQ might be running in parallel on different
> CPUs. So there is a chance that a callback is executed even though it
> has been unregistered. As omap_dispc_wait_for_irq_timeout() declares a
> completion on stack, the dispc_irq_wait_handler() callback might try to
> access a completion structure that is invalid. This leads to crashes and
> hangs.
> 
> Solution is to divide unregister calls into two sets:
>   1. Non-strict unregistering of callbacks. Callbacks could safely be
>      executed after unregistering them. This is the case with unregister
>      calls from the IRQ handler itself.
>   2. Strict (synchronized) unregistering. Callbacks are not allowed
>      after unregistering. This is the case with completion waiting.
> 
> The above solution should satisfy one of the original intentions of the
> driver: callbacks should be able to unregister themselves.

I think it'd be better to create a new function for the nosync version,
and keep the old name for the sync version. The reason for this is to
minimize the amount of changes, as I think this one needs to be applied
to stable kernel trees also.

Also, I think we need similar one for dsi.c, as it has the same kind of
irq handling. But with a quick glance only sync version is needed there.

However, I'm not quite sure about this approach. The fix makes sense,
but it makes me think if the irq handling is designed the wrong way.

While debugging and fixing this, did you think some other irq handling
approach would be saner?

 Tomi

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux