On Tue, 12 Jun 2012, Austin Schuh wrote: > On Tue, Jun 12, 2012 at 12:40 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 12 Jun 2012, Austin Schuh wrote: > >> $ cat /sys/kernel/debug/usb/ehci/0000\:00\:1a.7/async > >> qh/ffff88047452a700 dev8 hs ep2 42002208 40000000 (80008c00 �data1 nak4) > >> � � � ffff880037801180 out len=0 80008c00 urb ffff880596b06600 > > > > This means that the transfer was completed but the computer apparently > > never got a completion interrupt. > > > >> $ cat /sys/kernel/debug/usb/ehci/0000\:00\:1a.7/lpm > >> $ cat /sys/kernel/debug/usb/ehci/0000\:00\:1a.7/periodic > >> size = 1024 > >> $ cat /sys/kernel/debug/usb/ehci/0000\:00\:1a.7/registers > >> bus pci, device 0000:00:1a.7 > >> EHCI Host Controller > >> EHCI 1.00, hcd state 1 > >> ownership 00000001 > >> SMI sts/enable 0xc0080000 > >> structural params 0x00103206 > >> capability params 0x00016871 > >> status 8008 Async FLR > >> command 0010021 (park)=0 ithresh=1 Async period=1024 RUN > >> intrenable 37 IAA FATAL PCD ERR INT > >> uframe 3f17 > >> port:1 status 001000 0 �ACK POWER sig=se0 > >> port:2 status 001000 0 �ACK POWER sig=se0 > >> port:3 status 003000 0 �ACK POWER OWNER sig=se0 > >> port:4 status 001005 0 �ACK POWER sig=se0 PE CONNECT > >> port:5 status 001000 0 �ACK POWER sig=se0 > >> port:6 status 001000 0 �ACK POWER sig=se0 > >> irq normal 43424 err 0 reclaim 6314 (lost 21) > >> complete 43440 unlink 3819 > > > > That's pretty normal, except that the count of lost IAA interrupts > > ought to be 0. > > Does that typically mean anything? It could indicate more missing interrupts. > > If you get rid of the line saying: > > > > � � � � � � � �ehci->need_io_watchdog = 0; > > > > under the PCI_VENDOR_ID_INTEL case in > > drivers/usb/host/ehci-pci.c:ehci_pci_setup(), it may help. > > That seems to have fixed it. Hmm... I used to be able to run my code > anywhere from 3-30 times before it would hang. It is at 470 cycles > right now and counting without an issue. > > Apart from building and deploying custom patched kernels everywhere > that I need to run my code, is there something that I can do to help > get this hack packaged up into something that could be submitted? Or, > is there something else that I should look around for that could be > the root cause of the problem? My first guess would be bad hardware. On the other hand, the fact that the same thing happened on two computers with different chipsets argues that it's not a hardware problem. A known software bug could also cause interrupts to get lost, although I have never observed this. A patch (for the 3.4 kernel) to fix this bug is below; you could try it with ehci->need_io_watchdog set back to 0. Are the two computers you tried both SMP systems? Alan Stern Index: v/drivers/usb/host/ehci.h =================================================================== --- v.orig/drivers/usb/host/ehci.h +++ v/drivers/usb/host/ehci.h @@ -83,7 +83,8 @@ struct ehci_hcd { /* one per controlle struct ehci_qh *dummy; /* For AMD quirk use */ struct ehci_qh *reclaim; struct ehci_qh *qh_scan_next; - unsigned scanning : 1; + bool scanning:1; + bool need_rescan:1; /* periodic schedule support */ #define DEFAULT_I_TDPS 1024 /* some HCs can do less */ Index: v/drivers/usb/host/ehci-hcd.c =================================================================== --- v.orig/drivers/usb/host/ehci-hcd.c +++ v/drivers/usb/host/ehci-hcd.c @@ -535,13 +535,20 @@ static void ehci_work (struct ehci_hcd * * it reports urb completions. this flag guards against bogus * attempts at re-entrant schedule scanning. */ - if (ehci->scanning) + if (ehci->scanning) { + ehci->need_rescan = true; return; - ehci->scanning = 1; + } + ehci->scanning = true; + + rescan: + ehci->need_rescan = false; scan_async (ehci); if (ehci->next_uframe != -1) scan_periodic (ehci); - ehci->scanning = 0; + if (ehci->need_rescan) + goto rescan; + ehci->scanning = false; /* the IO watchdog guards against hardware or driver bugs that * misplace IRQs, and should let us run completely without IRQs. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html