On 2022/10/14 15:56, Mathias Nyman Wrote: > On 14.10.2022 6.12, liulongfang wrote: >> On 2022/9/26 15:58, Mathias Nyman wrote: >>> On 24.9.2022 5.35, liulongfang wrote: >>>> On 2022/9/22 21:01, Mathias Nyman Wrote: >>>>> Hi >>>>> >>>>> On 15.9.2022 4.11, Longfang Liu wrote: >>>>>> When HCE(Host Controller Error) is set, it means that the xhci hardware >>>>>> controller has an error at this time, but the current xhci driver >>>>>> software does not log this event. >>>>>> >>>>>> By adding an HCE event detection in the xhci interrupt processing >>>>>> interface, a warning log is output to the system, which is convenient >>>>>> for system device status tracking. >>>>>> >>>>> >>>>> xHC should cease all activity when it sets HCE, and is probably not >>>>> generating interrupts anymore. >>>>> >>>>> Would probably be more useful to check for HCE at timeouts than in the >>>>> interrupt handler. >>>>> >>>> >>>> Which function of the driver code is this timeout in? >>> >>> xhci_handle_command_timeout() will usually trigger at some point, >>> >> >> Because this HCE error is reported in the form of an interrupt signal, it is more >> concise to put it in xhci_irq() than in xhci_handle_command_timeout(). >> > > Patch was added to queue after you reported your xHC hardware triggers interrupts when HCE is set. > I'll send it forward after 6.1-rc1 > In our test version, a test log is added to xhci_irq(). In the test case that triggers HCE, the HCE interrupt is reported and recorded through the log: {53}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 {53}[Hardware Error]: event severity: recoverable {53}[Hardware Error]: Error 0, type: recoverable {53}[Hardware Error]: section type: unknown, c8b328a8-9917-4af6-9a13-2e08ab2e7586 {53}[Hardware Error]: section length: 0x48 {53}[Hardware Error]: 00000000: 0000186b 00000201 001a0001 00000000 k............... {53}[Hardware Error]: 00000010: 00000000 00000000 00000000 00000028 ............(... {53}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................ {53}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................ {53}[Hardware Error]: 00000040: 00000001 00000000 ........ xhci_hcd 0000:30:01.0: xHCI host not responding to stop endpoint command. xhci_hcd 0000:30:01.0: USBSTS: PCD HCE xhci_hcd 0000:30:01.0: xHCI host controller not responding, assume dead xhci_hcd 0000:30:01.0: HC died; cleaning up usb usb1-port1: couldn't allocate usb_device rmmod xhci-pci xhci_hcd 0000:30:01.0: remove, state 4 usb usb2: USB disconnect, device number 1 xhci_hcd 0000:30:01.0: USB bus 2 deregistered xhci_hcd 0000:30:01.0: remove, state 1 usb usb1: USB disconnect, device number 1 xhci_hcd 0000:30:01.0: USB bus 1 deregistered Thanks, Longfang. > xHCI specification still indicate HCE might not trigger interrupts: > > Section 4.24.1 -Internal Errors > ... > "Software should implement an algorithm for checking the HCE flag if the xHC is > not responding." > > Thanks > -Mathias > . >