Re: Control message failures kill entire XHCI stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

On 18.01.2015 22:55, Devin Heitmueller wrote:
> Hello,
> 
> I'm debugging some issues on a couple of different USB TV tuners which
> boil down to the following error:
> 
> xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
> 
> This is followed by the XHCI driver disconnecting *all* USB devices
> from the controller.
> 
> I've done a bit of debugging, and the root of the issue appears to be
> an intermittent control message timing out, and then the call to
> usb_kill_urb() that occurs inside of usb_control_msg() when the
> timeout expires is what causes the disconnect.  Specifically, it would
> appear that xhci_urb_dequeue tries to stop the endpoint using
> xhci_queue_stop_endpoint(), the command gets queued but the IRQ never
> fires to perform the TRB_STOP_RING completion code. The function
> xhci_stop_endpoint_command_watchdog() fires after five seconds, which
> tears down the entire driver.
> 
> Below is the dmesg output with the xhci_hcd debugging enabled.  The
> dump_stack() call is something I added (i.e. it's not an OOPS) so I
> could see which code path was making the usb_kill_urb() call that was
> failing.  Note that the caller is using usb_control_msg() with 1000ms
> timeout, and we can see from the timestamps that the timer expires
> which is what causes the call to usb_kill_urb().
> 
> I would imagine that explicitly killing URBs is a pretty uncommon task
> for control endpoint messages (compared to ISOC or BULK endpoints
> where it's done regularly).  Is it possible a exception case has been
> missed?
> 
> Independent of the usb_kill_urb() killing the entire stack, I still
> don't really understand yet why the control message failed in the
> first place.  This is a well-exercised code path in the au0828 driver
> (related to I2C transfers) and I've never seen this when using the
> EHCI driver.  My assumption is that either the HCD is getting sick
> which is causing both the control message to fail as well as putting
> it into an inconsistent state such that we never get the TRB_STOP_RING
> IRQ, or we've got two separate bugs - the control message failing for
> some "legitimate" reason (i.e. I screwed something up in my au0828
> driver), followed by the usb_kill_urb() error simply not handling
> killing of URBs on a control endpoint (which causes the entire stack
> to go down).
> 
> Thoughts/suggestions/recommendations are welcome.
> 

There are a couple of xhci bugs triggered by dvb devices:
https://bugzilla.kernel.org/show_bug.cgi?id=75521
https://bugzilla.kernel.org/show_bug.cgi?id=65021

The first one (75521) I believe is mostly fixed by patches in 3.18 and early
3.19-rc, so work on a 3.19-rc kernel to eliminate those issues.

The second bug (65021) looks more like your case, it queues two stop_endpoints 
commands almost simultaneously, which end up never completing, ->timeout and tear down xhci.
That bug has a debug patch for command ring status, you could try it out to check if
the command queue is running among other details.

A verbose xhci dmesg log log using: 
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
could give some insight to what's happening

-Mathias 







--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux