On Tue, May 13, 2014 at 11:08:27AM -0400, Alan Stern wrote: > Please CC: your patches to the maintainer of the driver you are > changing. > > On Tue, 13 May 2014, Dr. Werner Fink wrote: > > > Hi, > > > > this bug hits my system now a long time. I had found e.g. this > > > > speedy kernel: [ 9575.033019] irq 16: nobody cared (try booting with the "irqpoll" option) > > speedy kernel: [ 9575.033022] Pid: 0, comm: swapper/0 Not tainted 3.7.10-1.1-desktop #1 > > The 3.7 kernel is fairly old. It's entirely possible that the problem > has already been fixed in 3.14. The patch I've attached is for 3.14 and AFAICS it is likely not fixed. In past I had reported this problem more than once and got always the same answer that the new kernel will not show this problem. > > speedy kernel: [ 9575.033023] Call Trace: > > speedy kernel: [ 9575.033031] [<ffffffff81004818>] dump_trace+0x88/0x300 > > speedy kernel: [ 9575.033035] [<ffffffff8158b033>] dump_stack+0x69/0x6f > > speedy kernel: [ 9575.033038] [<ffffffff810d6c56>] __report_bad_irq+0x36/0xe0 > > speedy kernel: [ 9575.033041] [<ffffffff810d7158>] note_interrupt+0x1e8/0x240 > > speedy kernel: [ 9575.033045] [<ffffffff810d4772>] handle_irq_event_percpu+0xc2/0x250 > > speedy kernel: [ 9575.033047] [<ffffffff810d4947>] handle_irq_event+0x47/0x70 > > speedy kernel: [ 9575.033049] [<ffffffff810d7c50>] handle_fasteoi_irq+0x60/0x100 > > speedy kernel: [ 9575.033051] [<ffffffff810046c8>] handle_irq+0x18/0x30 > > speedy kernel: [ 9575.033053] [<ffffffff810043a2>] do_IRQ+0x52/0xd0 > > speedy kernel: [ 9575.033056] [<ffffffff8159806d>] common_interrupt+0x6d/0x6d > > speedy kernel: [ 9575.033061] [<ffffffff8132018c>] intel_idle+0xec/0x160 > > speedy kernel: [ 9575.033064] [<ffffffff81452e0d>] cpuidle_idle_call+0x9d/0x330 > > speedy kernel: [ 9575.033067] [<ffffffff8100be0a>] cpu_idle+0x6a/0xe0 > > speedy kernel: [ 9575.033071] [<ffffffff81ac8bc8>] start_kernel+0x3b8/0x3c3 > > speedy kernel: [ 9575.033073] [<ffffffff81ac8436>] x86_64_start_kernel+0x105/0x114 > > speedy kernel: [ 9575.033075] handlers: > > speedy kernel: [ 9575.033077] [<ffffffff813f2220>] usb_hcd_irq > > speedy kernel: [ 9575.033080] [<ffffffffa0282940>] rtl8139_interrupt [8139too] > > speedy kernel: [ 9575.033080] Disabling IRQ #16 > > > > IRQ 16 is used by ehci_hcd:usb1 and eth1. > > How do you know that the problem was caused by ehci-hcd rather than > 8139too? Or by some other piece of hardware entirely? I've seen this also with an other ethernet card. And the status bit is always a bit described in the USB. > > Adding the "irqpoll" option to the kernels > > command line had not helped. Therefore I had debugged this problem by adding a printk() > > debug line in the ehci_irq() function of drivers/usb/host/ehci-hcd.c. This had shown > > out that my USB controller causes STS_RECL (reclamation readonly status bit) in the > > IRQ status. > > What makes you think that STS_RECL is the cause of the problem? It is > quite normal for STS_RECL to be set. As described: the printk() does show exactly this bit. > > After a while this had lead to the message in the subject with the side effect that > > networking becomes slow. > > How do you know that something else didn't cause the "nobody cared" > error? Yes. > > From the debugging code I've evolved the attached patch. It is not perfect as it > > returns IRQ_NONE for the first time the STS_RECL status bit is found but it does > > its job. > > Please put your patches in the main email message; don't attach them. > Now there's no easy way for me to include it in this reply. > > The patch is definitely wrong. It will never set spurious_recl, > because the "if (unlikely(masked_status & STS_RECL))" test can't > succeed unless spurious_recl has already been set. OK ... the patch was changed as I had been told that I should do it this way. In my original code I simply use masked_status = status & (INTR_MASK | STS_FLR | STS_RECL); /* Shared IRQ? */ if (!masked_status || unlikely(ehci->rh_state == EHCI_RH_HALTED)) { spin_unlock_irqrestore(&ehci->lock, flags); printk("ehci_irq status: %#8.8x", status); return IRQ_NONE; } and with this I can use my ethernet card more than 15 minutes. The printk() line I used first after I had also used some printk() lines in the ethernet driver to see what was wrong with the shared IRQ. Then I had identified the STS_RECL from the printk() above in my logs and or'd the STS_RECL to the masked status bits. After this all problems had been gone. > > Alan Stern Werner -- "Having a smoking section in a restaurant is like having a peeing section in a swimming pool." -- Edward Burr
Attachment:
pgpMqHxWx_Bu3.pgp
Description: PGP signature