Re: XHCI abort CMD failure

"Shah, Nehal-bakulchandra" <Nehal-bakulchandra.Shah@xxxxxxx> · Thu, 28 Feb 2019 09:54:33 +0000

Hi Mathias,

On 2/27/2019 1:01 PM, Mathias Nyman wrote:
> Hi
> 
> On 26.2.2019 19.55, Shah, Nehal-bakulchandra wrote:
>> Hi
>>
>> In one of our customer platform, we are getting following errors
>>
>> [65136.606651] xhci_hcd 0000:00:10.0: Command timeout
>> [65136.606690] xhci_hcd 0000:00:10.0: Abort command ring
>> [65150.739738] xhci_hcd 0000:00:10.0: Abort failed to stop command ring: -110
>> [65150.740115] xhci_hcd 0000:00:10.0: // Halt the HC
>> [65150.785382] xhci_hcd 0000:00:10.0: Host halt failed, -110
>> [65150.785419] xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead
>> [65150.785874] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 1, ep index 0
>> [65150.785882] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 1, ep index 2
>> [65150.785911] xhci_hcd 0000:00:10.0: xHCI dying, ignoring interrupt. Shouldn't IRQs be disabled?
>> [65150.785921] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 2, ep index 0
>> [65150.785927] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 2, ep index 2
>> [65150.785937] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 0
>> [65150.785943] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 2
>> [65150.785971] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 3
>> [65150.785978] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 3, ep index 6
>> [65150.785987] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 4, ep index 0
>> [65150.785993] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 4, ep index 2
>> [65150.786003] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 4, ep index 4
>> [65150.786012] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 5, ep index 0
>> [65150.786018] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 5, ep index 2
>> [65150.786027] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 6, ep index 0
>> [65150.786033] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 6, ep index 2
>> [65150.786039] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 6, ep index 3
>> [65150.786046] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 7, ep index 0
>> [65150.786052] xhci_hcd 0000:00:10.0: Killing URBs for slot ID 8, ep index 0
>> [65150.786059] xhci_hcd 0000:00:10.0: HC died; cleaning up
>> [65150.786597] xhci_hcd 0000:00:10.0: Timeout while waiting for setup device command
>>
>>
>> So as per my understanding, we are getting time out in abort command as CRR is not getting negated and it assumes controller is died. Now post this
>> host goes completely in weird state. So what can be the recovery mechanism? The comment in  xhci_abort_cmd_ring function says that "In the future we should distinguish between -ENODEV and -ETIMEDOUT * and try to recover a -ETIMEDOUT with a host controller reset."
> 
> What kernel version is this issue seen on?
> I recall there being some race issue in this area some time ago.
> 
>>
>> Will it be a good idea to reset the controller or any other suggestion for recovery ? Current situation demands the rebooting of the system.
> 
> Yes, I think it would be a good idea to try to reset the host in -ETIMEDOUT case.
> So far the most common case was that host controller was actually removed (PCI hotplug)
> in the case of first a command timing out, and then aborting the command ring timing out, so
> just tearing down the host has so far been enough.

> Now we just need to implement this :)
Thanks for your input.  Will have a look to implement XHCI Reset host controller. 

> -Mathias
> 

Regards
Nehal Shah