Hi Mathias, On 2/27/2019 1:01 PM, Mathias Nyman wrote: > Hi > > On 26.2.2019 19.55, Shah, Nehal-bakulchandra wrote: >> Hi >> >> In one of our customer platform, we are getting following errors >> >> [65136.606651] xhci_hcd 0000:00:10.0: Command timeout >> [65136.606690] xhci_hcd 0000:00:10.0: Abort command ring >> [65150.739738] xhci_hcd 0000:00:10.0: Abort failed to stop command ring: -110 >> [65150.740115] xhci_hcd 0000:00:10.0: // Halt the HC >> [65150.785382] xhci_hcd 0000:00:10.0: Host halt failed, -110 >> [65150.785419] xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead >> [65150.785874] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 1, ep index 0 >> [65150.785882] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 1, ep index 2 >> [65150.785911] xhci_hcd 0000:00:10.0: xHCI dying, ignoring interrupt. Shouldn't IRQs be disabled? >> [65150.785921] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 2, ep index 0 >> [65150.785927] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 2, ep index 2 >> [65150.785937] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 0 >> [65150.785943] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 2 >> [65150.785971] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 3 >> [65150.785978] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 6 >> [65150.785987] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 4, ep index 0 >> [65150.785993] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 4, ep index 2 >> [65150.786003] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 4, ep index 4 >> [65150.786012] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 5, ep index 0 >> [65150.786018] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 5, ep index 2 >> [65150.786027] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 6, ep index 0 >> [65150.786033] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 6, ep index 2 >> [65150.786039] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 6, ep index 3 >> [65150.786046] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 7, ep index 0 >> [65150.786052] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 8, ep index 0 >> [65150.786059] xhci_hcd 0000:00:10.0: HC died; cleaning up >> [65150.786597] xhci_hcd 0000:00:10.0: Timeout while waiting for setup device command >> >> >> So as per my understanding, we are getting time out in abort command as CRR is not getting negated and it assumes controller is died. Now post this >> host goes completely in weird state. So what can be the recovery mechanism? The comment in xhci_abort_cmd_ring function says that "In the future we should distinguish between -ENODEV and -ETIMEDOUT * and try to recover a -ETIMEDOUT with a host controller reset." > > What kernel version is this issue seen on? > I recall there being some race issue in this area some time ago. > >> >> Will it be a good idea to reset the controller or any other suggestion for recovery ? Current situation demands the rebooting of the system. > > Yes, I think it would be a good idea to try to reset the host in -ETIMEDOUT case. > So far the most common case was that host controller was actually removed (PCI hotplug) > in the case of first a command timing out, and then aborting the command ring timing out, so > just tearing down the host has so far been enough. > Now we just need to implement this :) Thanks for your input. Will have a look to implement XHCI Reset host controller. > -Mathias > Regards Nehal Shah