Re: [PATCH] xhci: print warning when HCE was set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022/10/14 15:56, Mathias Nyman Wrote:
> On 14.10.2022 6.12, liulongfang wrote:
>> On 2022/9/26 15:58, Mathias Nyman wrote:
>>> On 24.9.2022 5.35, liulongfang wrote:
>>>> On 2022/9/22 21:01, Mathias Nyman Wrote:
>>>>> Hi
>>>>>
>>>>> On 15.9.2022 4.11, Longfang Liu wrote:
>>>>>> When HCE(Host Controller Error) is set, it means that the xhci hardware
>>>>>> controller has an error at this time, but the current xhci driver
>>>>>> software does not log this event.
>>>>>>
>>>>>> By adding an HCE event detection in the xhci interrupt processing
>>>>>> interface, a warning log is output to the system, which is convenient
>>>>>> for system device status tracking.
>>>>>>
>>>>>
>>>>> xHC should cease all activity when it sets HCE, and is probably not
>>>>> generating interrupts anymore.
>>>>>
>>>>> Would probably be more useful to check for HCE at timeouts than in the
>>>>> interrupt handler.
>>>>>
>>>>
>>>> Which function of the driver code is this timeout in?
>>>
>>> xhci_handle_command_timeout() will usually trigger at some point,
>>>
>>
>> Because this HCE error is reported in the form of an interrupt signal, it is more
>> concise to put it in xhci_irq() than in xhci_handle_command_timeout().
>>
> 
> Patch was added to queue after you reported your xHC hardware triggers interrupts when HCE is set.
> I'll send it forward after 6.1-rc1
> 

In our test version, a test log is added to xhci_irq(). In the test case that triggers HCE,
the HCE interrupt is reported and recorded through the log:

{53}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
{53}[Hardware Error]: event severity: recoverable
{53}[Hardware Error]:  Error 0, type: recoverable
{53}[Hardware Error]:   section type: unknown, c8b328a8-9917-4af6-9a13-2e08ab2e7586
{53}[Hardware Error]:   section length: 0x48
{53}[Hardware Error]:   00000000: 0000186b 00000201 001a0001 00000000  k...............
{53}[Hardware Error]:   00000010: 00000000 00000000 00000000 00000028  ............(...
{53}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000  ................
{53}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000  ................
{53}[Hardware Error]:   00000040: 00000001 00000000                    ........
 xhci_hcd 0000:30:01.0: xHCI host not responding to stop endpoint command.
 xhci_hcd 0000:30:01.0: USBSTS: PCD HCE
 xhci_hcd 0000:30:01.0: xHCI host controller not responding, assume dead
 xhci_hcd 0000:30:01.0: HC died; cleaning up
 usb usb1-port1: couldn't allocate usb_device
rmmod xhci-pci
 xhci_hcd 0000:30:01.0: remove, state 4
 usb usb2: USB disconnect, device number 1
 xhci_hcd 0000:30:01.0: USB bus 2 deregistered
 xhci_hcd 0000:30:01.0: remove, state 1
 usb usb1: USB disconnect, device number 1
 xhci_hcd 0000:30:01.0: USB bus 1 deregistered

Thanks,
Longfang.

> xHCI specification still indicate HCE might not trigger interrupts:
>  
> Section 4.24.1 -Internal Errors
> ...
> "Software should implement an algorithm for checking the HCE flag if the xHC is
> not responding."
> 
> Thanks
> -Mathias
> .
> 



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux