On Fri, May 23, 2008 at 2:15 AM, ilya <jibberboosh@xxxxxxxxx> wrote: > Hello everyone, > > I am fairly new to kernel development and definitely very new to embedded kernel > development; if this question does not belong on this list maybe you > can direct me > to the appropriate one... > > I have ARM development board from Star Semiconductor with their STR8133 ARM9 > CPU. I am running kernel version 2.6.16 and have a PCI board hooked up to it. > My kernel module that controls the board works perfectly fine but I get constant > "irq6: nobody cared" error messages followed by a stack dump [look > below for example]. > I confirmed that during this time none of the interrupt bits in any of > the status registers > on the board are set. I have an access to a logic analyzer and so I hooked it up > to a PCI breakout board and confirmed that my interrupt handler is > called way before > the board drives the INTA# line. cat /proc/interrupts shows that my > module is the > only one on IRQ6. The module is actually used by another kernel module > that utilizes kthreads [puts them to sleep, wakes the up, etc.] My > question is this: > > Assuming there is no hardware problem, are there some fundamental differences > between regular Linux kernel and embedded kernel that can 'cause this kind of > behavior? > > Any information or suggestions would be highly appreciated. > > -- ilya > > Example of a dump: > > irq6: nobody cared > > Pid: 664, comm: file-storage-ga > CPU: 0 > PC is at l800_queue+0x170/0x260 [l800_wudc] > LR is at wake_up_process+0x18/0x20 > pc : [<bf009ce0>] lr : [<c0042618>] Not tainted > sp : c10ade44 ip : 00000000 fp : c10ade7c > r10: c0ab29b5 r9 : bf013a0c r8 : c0ab2995 > r7 : c0ab4860 r6 : 0000000d r5 : 00000001 r4 : ffffffff > r3 : 60000013 r2 : 80000093 r1 : 0000000f r0 : 00000000 > Flags: nZcv IRQs on FIQs on Mode SVC_32 Segment kernel > Control: 397F Table: 00844000 DAC: 00000017 > [<c0027b60>] (show_regs+0x0/0x50) from [<c00268fc>] (report_bad_irq+0x5c/0xd0) > r4 = C0311D00 > [<c00268a0>] (report_bad_irq+0x0/0xd0) from [<c0026d34>] > (do_level_IRQ+0xd4/0x190) > r6 = C0363D98 r5 = 00000000 r4 = 00000000 > [<c0026c60>] (do_level_IRQ+0x0/0x190) from [<c0026e44>] (asm_do_IRQ+0x54/0x150) > [<c0026df0>] (asm_do_IRQ+0x0/0x150) from [<c00259d4>] (__irq_svc+0x34/0x60) > r8 = C0AB2995 r7 = C0AB4860 r6 = 0000000D r5 = FFF1B140 > r4 = FFFFFFFF > [<bf009b70>] (l800_queue+0x0/0x260 [l800_wudc]) from [<bf01b3b0>] > (start_transfer+0x90/0x2c0 [g_file_storage]) > [<bf01b320>] (start_transfer+0x0/0x2c0 [g_file_storage]) from > [<bf01c4a0>] (send_status+0x180/0x250 [g_file_storage]) > [<bf01c320>] (send_status+0x0/0x250 [g_file_storage]) from > [<bf01e230>] (fsg_main_thread+0x910/0x1ff0 [g_file_storage]) > [<bf01d920>] (fsg_main_thread+0x0/0x1ff0 [g_file_storage]) from > [<c005fed4>] (kthread+0xf4/0x130) > [<c005fde0>] (kthread+0x0/0x130) from [<c0049e10>] (do_exit+0x0/0x8c0) > handlers: > [<bf0085f0>] (l800_irq+0x0/0x10f0 [l800_wudc]) > The stack trace starts here: ./drivers/usb/gadget/file_storage.c: static int fsg_main_thread(void *fsg_) fsg->thread_task = kthread_create(fsg_main_thread, fsg, Looking at this: kernel/irq/spurious.c:int noirqdebug_setup(char *str) kernel/irq/spurious.c: noirqdebug = 1; kernel/irq/spurious.c:__setup("noirqdebug", noirqdebug_setup); kernel/irq/spurious.c:module_param(noirqdebug, bool, 0644); kernel/irq/spurious.c:MODULE_PARM_DESC(noirqdebug, "Disable irq lockup detection when true"); if noirqdebug is set to 0, then note_interrupt() will not be executed, and u won't get all those message: In kernel/irq/handle.c: __do_IRQ(): if (!noirqdebug) note_interrupt(irq, desc, action_ret); } And note_interrupt() will execute report_bad_irq(): void note_interrupt(unsigned int irq, struct irq_desc *desc, irqreturn_t action_ret) { if (unlikely(action_ret != IRQ_HANDLED)) { if (unlikely(action_ret != IRQ_NONE)) report_bad_irq(irq, desc, action_ret); if (unlikely(desc->irqs_unhandled > 99900)) { /* * The interrupt is stuck */ __report_bad_irq(irq, desc, action_ret); I just described how your errors comes about.....not suggesting any solutions yet (possibly u can try turning noirqdebug to 1). Alternatively u may have to think about spurious interrupts (google)....like high temperatures or other environment sources of electromagnetic noises resultings in interrupt signals etc..... A summary from http://www7.informatik.uni-erlangen.de/~ksjh/research/cluster/timesync/sprint.html (last item below looked like your case) is here: * Floating status bits on the parallel port o not applicable for our case: No parallel port associated with the IRQ 7. * Problems with I/O-APIC code in the kernel o not applicable for our case: Even when using a kernel without I/O-APIC support compiled in, our IRQ 7 handler was called. * Some other signal lines floating o not applicable for our case: Pattern too regular. * Problems with tulip chip set o not applicable for our case: No tulip based network adapter card used. * Problems with a binary nVidia driver o not applicable for our case: No binary nVidia driver used. * Some errors in the initialization of the chip set (VIA VT8363A/82C686B on an Asus A7V133 main board) o could be the case here: IRQ 7 handler is less frequently called with a new BIOS firmware. We will check if the interrupt handler for IRQ 7 is also called using a DOS start disk to determine if this is a linux specific problem. * A device issues interrupt requests for a period of time too short to be recognized correctly by the 8259A, or the CPU acknowledges ( /INTA) the request too late (See data sheet for the Intersil 82C59A , page 6) o could be the case here, to investigate further we'll have to remove all PCI cards from our system and see if the interrupts continue to appear. -- Regards, Peter Teoh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ