Hi On 18.01.2015 22:55, Devin Heitmueller wrote: > Hello, > > I'm debugging some issues on a couple of different USB TV tuners which > boil down to the following error: > > xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. > > This is followed by the XHCI driver disconnecting *all* USB devices > from the controller. > > I've done a bit of debugging, and the root of the issue appears to be > an intermittent control message timing out, and then the call to > usb_kill_urb() that occurs inside of usb_control_msg() when the > timeout expires is what causes the disconnect. Specifically, it would > appear that xhci_urb_dequeue tries to stop the endpoint using > xhci_queue_stop_endpoint(), the command gets queued but the IRQ never > fires to perform the TRB_STOP_RING completion code. The function > xhci_stop_endpoint_command_watchdog() fires after five seconds, which > tears down the entire driver. > > Below is the dmesg output with the xhci_hcd debugging enabled. The > dump_stack() call is something I added (i.e. it's not an OOPS) so I > could see which code path was making the usb_kill_urb() call that was > failing. Note that the caller is using usb_control_msg() with 1000ms > timeout, and we can see from the timestamps that the timer expires > which is what causes the call to usb_kill_urb(). > > I would imagine that explicitly killing URBs is a pretty uncommon task > for control endpoint messages (compared to ISOC or BULK endpoints > where it's done regularly). Is it possible a exception case has been > missed? > > Independent of the usb_kill_urb() killing the entire stack, I still > don't really understand yet why the control message failed in the > first place. This is a well-exercised code path in the au0828 driver > (related to I2C transfers) and I've never seen this when using the > EHCI driver. My assumption is that either the HCD is getting sick > which is causing both the control message to fail as well as putting > it into an inconsistent state such that we never get the TRB_STOP_RING > IRQ, or we've got two separate bugs - the control message failing for > some "legitimate" reason (i.e. I screwed something up in my au0828 > driver), followed by the usb_kill_urb() error simply not handling > killing of URBs on a control endpoint (which causes the entire stack > to go down). > > Thoughts/suggestions/recommendations are welcome. > There are a couple of xhci bugs triggered by dvb devices: https://bugzilla.kernel.org/show_bug.cgi?id=75521 https://bugzilla.kernel.org/show_bug.cgi?id=65021 The first one (75521) I believe is mostly fixed by patches in 3.18 and early 3.19-rc, so work on a 3.19-rc kernel to eliminate those issues. The second bug (65021) looks more like your case, it queues two stop_endpoints commands almost simultaneously, which end up never completing, ->timeout and tear down xhci. That bug has a debug patch for command ring status, you could try it out to check if the command queue is running among other details. A verbose xhci dmesg log log using: echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control could give some insight to what's happening -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html