Hi On 24.02.2017 17:01, shal@xxxxxxx wrote:
Hello, I have a BUG on USB xhci. The trace here : [11518.982950] xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead [11519.027106] xhci_hcd 0000:07:00.0: Host halt failed, -110 [11519.027108] xhci_hcd 0000:07:00.0: Abort command ring failed [11519.027215] xhci_hcd 0000:07:00.0: HC died; cleaning up [11519.027230] xhci_hcd 0000:07:00.0: Timeout while waiting for setup device command [11519.442303] usb 3-1: device not accepting address 15, error -108 [11519.442324] usb usb3-port1: couldn't allocate usb_device After this error happens, I have to reboot Linux. Without reboot the USB port doesn't work for any devices.
We're waiting for the device to respond to a setup device. It doesn't respond, so we have to cancel the command. (stop the command ring, skip the command, and restart the command ring) We first fail in stopping the command ring, then we fail in halting the entire host controller.
The situation. uname -a : Linux shal 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
4.10 contains changes in exactly this area to prevent a race that might re-start the command we check if it stopped Do you have an older kernel available to check if its a regression in 4.10?
Part of lspci: 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 07:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
Do you have a host from another vendor to try this on? Log show that host controller becomes really unresponsive after we try to abort the command ring.
# lsusb Bus 002 Device 004: ID 0582:0044 Roland Corp. EDIROL UA-1000 Bus 002 Device 003: ID 046d:c52e Logitech, Inc. MK260 Wireless Combo Receiver Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Note that I have booted with the GRUB Option : GRUB_CMDLINE_LINUX_DEFAULT="quiet splash usbcore.old_scheme_first=1" I work with an old Android smartphone in fastboot mode. The smartphone is connected with a long USB cable (5m). In fastboot mode (and only with this mode), the devices is not reachable . There is error like this : usb 3-1: device not accepting address 12, error -71 So, I had "usbcore.old_scheme_first=1" in kernel command option and then I can reach the device in fastboot mode. But I performs some operation on the smartphone and sometime the device hung .
Does the host always hang after a command times out?, i.e is there ever a timeout message: "xhci_hcd 0000:07:00.0: Timeout while ..." without the host dying messages: xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead xhci_hcd 0000:07:00.0: Host halt failed, -110 xhci_hcd 0000:07:00.0: Abort command ring failed xhci_hcd 0000:07:00.0: HC died; cleaning up
In this case, my USB port hung too and it is impossible to connect any devices on it (smartphone or usb key for e.g). I have to reboot my Linux, in order to have USB port working again.... Note that, during operation the entire Linux freeze few seconds... My question : - There is a method to avoid that my USB port hung
You could try if the EHCI usb controller works. 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller
- If not, there is a method to have a working usb port without rebooting ?
Try reloading xhci, might do the trick, unless controller is really stuck.
Thank More traces: [11466.611552] usb 3-1: USB disconnect, device number 11
sudden disconnect
[11468.957608] usb 3-1: new high-speed USB device number 12 using xhci_hcd [11470.878811] usb 3-1: Device not responding to setup address. [11486.881738] usb 3-1: Device not responding to setup address.
So there are already a couple transaction errors when trying to address the device
[11487.088447] usb 3-1: device not accepting address 12, error -71 [11487.532378] usb 3-1: new high-speed USB device number 14 using xhci_hcd [11487.559735] usb 3-1: unable to get BOS descriptor [11487.564929] usb 3-1: New USB device found, idVendor=18d1, idProduct=d00d [11487.564932] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [11487.564934] usb 3-1: Product: Android [11487.564935] usb 3-1: Manufacturer: Google [11489.585534] usb 3-1: USB disconnect, device number 14
sudden disconnect
[11491.748090] usb 3-1: new high-speed USB device number 15 using xhci_hcd [11518.982950] xhci_hcd 0000:07:00.0: Stopped the command ring failed, maybe the host is dead [11519.027106] xhci_hcd 0000:07:00.0: Host halt failed, -110 [11519.027108] xhci_hcd 0000:07:00.0: Abort command ring failed [11519.027215] xhci_hcd 0000:07:00.0: HC died; cleaning up [11519.027230] xhci_hcd 0000:07:00.0: Timeout while waiting for setup device command [11519.442303] usb 3-1: device not accepting address 15, error -108 [11519.442324] usb usb3-port1: couldn't allocate usb_device
Connection looks really unreliable. Enabling xhci debugging might reveal something: echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control -Mathias
-- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html