On Fri, 24 Jul 2009, Matthijs Kooijman wrote: > Hi Alan, > > firstly, I tried disabling the USB_TT_NEWSCHED. That didn't solve the lockup > problem. The rest of my tests have been done without NEWSCHED enabled. USB_TT_NEWSCHED shouldn't make any difference. > > http://marc.info/?t=124807676700001&r=1&w=2 > > You can try repeating some of the tests described there. > I've applied the patch suggested there. This obviously doesn't show the > warning anymore, but it also prevents my system from locking up. Instead of > the warning and lockup, I now get messages telling me ehci can't reset the > (Keyboard receiver) device, with error codes -71 or -108. Most of the times I > get both codes, but sometimes only -108 and once neither. That's normal. -71 means there was a communications error, which you would expect since the receiver wasn't plugged in and hence didn't respond to the messages sent by the kernel. -108 means the device has been unplugged, which again is to be expected. > Apart from the errors messages, my system seems usuable, though I've had some > problems with my tablet (see below) and foun that my keyboard stopped > responding at times, which might or might not be related. > > All of the following tests are with the patch applied. > > Just like in the thread you refer to, I get some dma_pool_destroy errors when > removing the ehci_hcd module: > ehci_hcd 0000:00:13.2: dma_pool_destroy ehci_qtd, ffff88004880f000 busy > ehci_hcd 0000:00:13.2: dma_pool_destroy ehci_qh, ffff88004884c000 busy Those are a real problem. I still need to figure out what's causing them. > Additionally, I get a lot of these (I think they started only after the first > disconnect or rmmod, not 100% sure): > ehci_hcd 0000:00:13.2: detected XactErr len 0/8 retry 25 > with incrementing retry numbers (I also saw a retry -189 once), or Those XactErr message also indicate communications errors. But you have far too many of them in your log; they suggest there is something wrong with your hardware. Probably a bad cable or a bad hub. > combinations of these two lines: > ehci_hcd 0000:00:13.2: detected XactErr len 0/9 retry 1 > usb 1-2.1.3: unlink qh4-0601/ffff88004884c6c0 start 1 [1/2 us] > with retry 1 every time. The latter seem to be outputted only when I have my > wacom tablet plugged in, and seem to cause it to be jumpy. This seems an > unrelated issue, though, since these messages appear at boot already, befor > unplugging any device. This funny behavior is caused by that hub-or-cable problem. You might want to try plugging the Wacom tablet directly into the other hub (the one the audio device is attached to). > Not sure how relevant this is, but as soon as I rmmod'ed the ehci_hcd module, > the USB devices registered with the ochi_hcd module (I think) and continued > working, with no more errors when unplugging anything. Again, that's expected. This bug I'm trying to track down is in ehci-hcd, not in ohci-hcd. > I originally thought that the dma_pool_destroy errors disappeared when > ohci_hcd is not loaded, though on a second try, they were still there. > > I did a more extended test (where I started out with too many USB devices > plugged in, sorry for that noise). I did find that the can't reset errors > didn't occur when unplugging my tablet, only with the keyboard receiver. > > > I've also enabled usbmon and got a few traces. Please find a full kernel log, > from boot at http://www.stdout.nl/ehci-debug/kernel.log.txt together with > http://www.stdout.nl/ehci-debug/during-disconnect.usbmon.txt and > http://www.stdout.nl/ehci-debug/during-rmmod.usbmon.txt > > I hope this helps. There's still those dma_pool_destroy problems to track down... Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html