Re: "irq6: nobody cared" mystery interrupt in embedded kernel running on ARM9

"Peter Teoh" <htmldeveloper@xxxxxxxxx> · Fri, 23 May 2008 14:24:17 +0800

On Fri, May 23, 2008 at 2:15 AM, ilya <jibberboosh@xxxxxxxxx> wrote:
> Hello everyone,
>
> I am fairly new to kernel development and definitely very new to embedded kernel
> development; if this question does not belong on this list maybe you
> can direct me
> to the appropriate one...
>
> I have ARM development board from Star Semiconductor with their STR8133 ARM9
> CPU. I am running kernel version 2.6.16 and have a PCI board hooked up to it.
> My kernel module that controls the board works perfectly fine but I get constant
> "irq6: nobody cared" error messages followed by a stack dump [look
> below for example].
> I confirmed that during this time none of the interrupt bits in any of
> the status registers
> on the board are set. I have an access to a logic analyzer and so I hooked it up
> to a PCI breakout board and confirmed that my interrupt handler is
> called way before
> the board drives the INTA# line. cat /proc/interrupts shows that my
> module is the
> only one on IRQ6. The module is actually used by another kernel module
> that utilizes kthreads [puts them to sleep, wakes the up, etc.] My
> question is this:
>
> Assuming there is no hardware problem, are there some fundamental differences
> between regular Linux kernel and embedded kernel that can 'cause this kind of
> behavior?
>
> Any information or suggestions would be highly appreciated.
>
> -- ilya
>
> Example of a dump:
>
> irq6: nobody cared
>
> Pid: 664, comm:      file-storage-ga
> CPU: 0
> PC is at l800_queue+0x170/0x260 [l800_wudc]
> LR is at wake_up_process+0x18/0x20
> pc : [<bf009ce0>]    lr : [<c0042618>]    Not tainted
> sp : c10ade44  ip : 00000000  fp : c10ade7c
> r10: c0ab29b5  r9 : bf013a0c  r8 : c0ab2995
> r7 : c0ab4860  r6 : 0000000d  r5 : 00000001  r4 : ffffffff
> r3 : 60000013  r2 : 80000093  r1 : 0000000f  r0 : 00000000
> Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
> Control: 397F  Table: 00844000  DAC: 00000017
> [<c0027b60>] (show_regs+0x0/0x50) from [<c00268fc>] (report_bad_irq+0x5c/0xd0)
>  r4 = C0311D00
> [<c00268a0>] (report_bad_irq+0x0/0xd0) from [<c0026d34>]
> (do_level_IRQ+0xd4/0x190)
>  r6 = C0363D98  r5 = 00000000  r4 = 00000000
> [<c0026c60>] (do_level_IRQ+0x0/0x190) from [<c0026e44>] (asm_do_IRQ+0x54/0x150)
> [<c0026df0>] (asm_do_IRQ+0x0/0x150) from [<c00259d4>] (__irq_svc+0x34/0x60)
>  r8 = C0AB2995  r7 = C0AB4860  r6 = 0000000D  r5 = FFF1B140
>  r4 = FFFFFFFF
> [<bf009b70>] (l800_queue+0x0/0x260 [l800_wudc]) from [<bf01b3b0>]
> (start_transfer+0x90/0x2c0 [g_file_storage])
> [<bf01b320>] (start_transfer+0x0/0x2c0 [g_file_storage]) from
> [<bf01c4a0>] (send_status+0x180/0x250 [g_file_storage])
> [<bf01c320>] (send_status+0x0/0x250 [g_file_storage]) from
> [<bf01e230>] (fsg_main_thread+0x910/0x1ff0 [g_file_storage])
> [<bf01d920>] (fsg_main_thread+0x0/0x1ff0 [g_file_storage]) from
> [<c005fed4>] (kthread+0xf4/0x130)
> [<c005fde0>] (kthread+0x0/0x130) from [<c0049e10>] (do_exit+0x0/0x8c0)
> handlers:
> [<bf0085f0>] (l800_irq+0x0/0x10f0 [l800_wudc])
>

The stack trace starts here:

./drivers/usb/gadget/file_storage.c:
static int fsg_main_thread(void *fsg_)
        fsg->thread_task = kthread_create(fsg_main_thread, fsg,

Looking at this:

kernel/irq/spurious.c:int noirqdebug_setup(char *str)
kernel/irq/spurious.c:  noirqdebug = 1;
kernel/irq/spurious.c:__setup("noirqdebug", noirqdebug_setup);
kernel/irq/spurious.c:module_param(noirqdebug, bool, 0644);
kernel/irq/spurious.c:MODULE_PARM_DESC(noirqdebug, "Disable irq lockup
detection when true");

if noirqdebug is set to 0, then note_interrupt() will  not be
executed, and u won't get all those message:

In kernel/irq/handle.c: __do_IRQ():

                        if (!noirqdebug)
                                note_interrupt(irq, desc, action_ret);
                }

And note_interrupt() will execute report_bad_irq():

void note_interrupt(unsigned int irq, struct irq_desc *desc,
                    irqreturn_t action_ret)
{
        if (unlikely(action_ret != IRQ_HANDLED)) {
                if (unlikely(action_ret != IRQ_NONE))
                        report_bad_irq(irq, desc, action_ret);

        if (unlikely(desc->irqs_unhandled > 99900)) {
                /*
                 * The interrupt is stuck
                 */
                __report_bad_irq(irq, desc, action_ret);

I just described how your errors comes about.....not suggesting any
solutions yet (possibly u can try turning noirqdebug to 1).

Alternatively u may have to think about  spurious interrupts
(google)....like high temperatures or other environment sources of
electromagnetic noises resultings  in interrupt signals etc.....

A summary from http://www7.informatik.uni-erlangen.de/~ksjh/research/cluster/timesync/sprint.html
(last item below looked like your case) is here:

    * Floating status bits on the parallel port
          o not applicable for our case: No parallel port associated
with the IRQ 7.
    * Problems with I/O-APIC code in the kernel
          o not  applicable for our case: Even when using a kernel
without I/O-APIC support compiled in, our IRQ 7 handler was called.
    * Some other signal lines floating
          o not  applicable for our case: Pattern too regular.
    * Problems with tulip chip set
          o not  applicable for our case: No tulip based network
adapter card used.
    * Problems with a binary nVidia driver
          o not  applicable for our case: No binary nVidia driver used.
    * Some errors in the initialization of the chip set (VIA
VT8363A/82C686B on an Asus A7V133 main board)
          o could be the case here: IRQ 7 handler is less frequently
called with a new BIOS firmware. We will check if the interrupt
handler for IRQ 7 is also called using a DOS start disk to determine
if this is a linux specific problem.
    * A device issues interrupt requests for a period of time too
short to be recognized correctly by the 8259A, or the CPU acknowledges
( /INTA) the request too late (See data sheet for the Intersil 82C59A
, page 6)
          o could be the case here, to investigate further we'll have
to remove all PCI cards from our system and see if the interrupts
continue to appear.

-- 
Regards,
Peter Teoh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ