HC died

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

We're experiencing a problem with our devices in the field where our
customers attach problematic USB devices that are causing the xhci
host controller to shut down with the "HC died; cleaning up" message.

I've narrowed this down to a timeout of the address device TRB on the
command ring (currently 5 seconds).  It sometimes takes our hardware
9.6 to complete this TRB.  When the driver is trying to stop the cmd
ring, the controller is busy for an additional 4.6 seconds.  This
results in the "HC died" message and shutdown of the host controller.

If I bump the command ring timeout beyond the max TRB completion time,
the host controller continues to be responsive and doesn't need to be
shut down.

My knowledge of how the usb protocol should handle this problem isn't
strong enough to know if there is a better solution than simply
increasing the command ring default timeout.

Is there a better way to handle this problem?

I've reproduced this on kernel 6.0, and 6.1.

Here is a log of the issue, along with some extra tracing I added to
track the duration of the command ring TRB completions.  FYI, I
recreate this problem by connecting/disconnecting the 5V pin quickly.
The theory is that the device is no longer present on the bus when the
hardware is trying to address the device.  I understand this is
somewhat degenerate, but it's happening frequently to our customers in
the field and it would be nice if the HC didn't die requiring them to
reboot to recover.

[F00:P06]  Feb 16 16:40:14  kernel: usb 3-2.1: new high-speed USB
device number 13 using xhci_hcd
[F00:P07]  Feb 16 16:40:14  kernel: xhci_hcd 0002:01:00.0: Set root
hub portnum to 4
[F00:P07]  Feb 16 16:40:14  kernel: xhci_hcd 0002:01:00.0: Set fake
root hub portnum to 2
[F00:P07]  Feb 16 16:40:14  kernel: xhci_hcd 0002:01:00.0: udev->tt =
0000000000000000
[F00:P07]  Feb 16 16:40:14  kernel: xhci_hcd 0002:01:00.0: udev->ttport = 0x0
[F00:P06]  Feb 16 16:40:14  kernel: xhci_hcd 0002:01:00.0:
xhci_setup_device: xhci_queue_address_device
[F00:P07]  Feb 16 16:40:14  kernel: xhci_hcd 0002:01:00.0: // Ding dong!
[F00:P06]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: completion
cmd 11, time: 163776215, duration: 9596181 us
[F00:P04]  Feb 16 16:40:23  kernel: usb 3-2.1: Device not responding
to setup address.
[F00:P07]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: // Ding dong!
[F00:P06]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: completion
cmd 10, time: 163778807, duration: 2530 us
[F00:P06]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: completion
cmd 9, time: 163778906, duration: 90 us
[F00:P07]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: Slot 15
output ctx = 0x8422f000 (dma)
[F00:P07]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: Slot 15
input ctx = 0x87ea9000 (dma)
[F00:P07]  Feb 16 16:40:23  kernel: xhci_hcd 0002:01:00.0: Set slot id
15 dcbaa entry 000000001958e382 to 0x8422f000
[F00:P07]  Feb 16 16:40:24  kernel: xhci_hcd 0002:01:00.0: Set root
hub portnum to 4
[F00:P07]  Feb 16 16:40:24  kernel: xhci_hcd 0002:01:00.0: Set fake
root hub portnum to 2
[F00:P07]  Feb 16 16:40:24  kernel: xhci_hcd 0002:01:00.0: udev->tt =
0000000000000000
[F00:P07]  Feb 16 16:40:24  kernel: xhci_hcd 0002:01:00.0: udev->ttport = 0x0
[F00:P06]  Feb 16 16:40:24  kernel: xhci_hcd 0002:01:00.0:
xhci_setup_device: xhci_queue_address_device
[F00:P07]  Feb 16 16:40:24  kernel: xhci_hcd 0002:01:00.0: // Ding dong!
[F00:P06]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: completion
cmd 11, time: 173580444, duration: 9596416 us
[F00:P04]  Feb 16 16:40:33  kernel: usb 3-2.1: Device not responding
to setup address.
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: // Ding dong!
[F00:P06]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: completion
cmd 10, time: 173583035, duration: 2529 us
[F00:P06]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: completion
cmd 9, time: 173583129, duration: 83 us
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Slot 16
output ctx = 0x8422f000 (dma)
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Slot 16
input ctx = 0x87ea9000 (dma)
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Set slot id
16 dcbaa entry 00000000ae4b7693 to 0x8422f000
[F00:P03]  Feb 16 16:40:33  kernel: usb 3-2.1: device not accepting
address 13, error -71
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: // Ding dong!
[F00:P06]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: completion
cmd 10, time: 173790654, duration: 2529 us
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Bad real port.
[F00:P06]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: completion
cmd 9, time: 173790749, duration: 86 us
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Slot 17
output ctx = 0x8422f000 (dma)
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Slot 17
input ctx = 0x87ea9000 (dma)
[F00:P07]  Feb 16 16:40:33  kernel: xhci_hcd 0002:01:00.0: Set slot id
17 dcbaa entry 00000000e45e3c7b to 0x8422f000
[F00:P07]  Feb 16 16:40:34  kernel: xhci_hcd 0002:01:00.0: // Ding dong!
[F00:P06]  Feb 16 16:40:34  kernel: xhci_hcd 0002:01:00.0: completion
cmd 10, time: 174478971, duration: 2528 us
[F00:P07]  Feb 16 16:40:34  kernel: xhci_hcd 0002:01:00.0: Bad real port.

Thanks for your assistance,

Seth



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux