On 5.2.2020 2.55, Joel Stanley wrote:
I'm supporting a system that uses Linux-as-a-bootloader to load a distro kernel via kexec, The systems have a TI TUSB73x0 PCIe controller which goes out to lunch after a kexec. This is the distro (post-kexec) kernel: [ 0.235411] pci 0003:01:00.0: xHCI HW did not halt within 16000 usec status = 0x0 [ 1.037298] xhci_hcd 0003:01:00.0: xHCI Host Controller [ 1.037367] xhci_hcd 0003:01:00.0: new USB bus registered, assigned bus number 1 [ 1.053481] xhci_hcd 0003:01:00.0: Host halt failed, -110 [ 1.053523] xhci_hcd 0003:01:00.0: can't setup: -110 [ 1.053565] xhci_hcd 0003:01:00.0: USB bus 1 deregistered [ 1.053629] xhci_hcd 0003:01:00.0: init 0003:01:00.0 fail, -110 [ 1.053703] xhci_hcd: probe of 0003:01:00.0 failed with error -110 There were some fixes made a few years back to improve the situation, but we've still had to carry some form of the patch below in the bootloader kernel. I would like to rework it so it can be merged. diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index dbac0fa9748d..eaa94456dd9d 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -789,6 +789,9 @@ void xhci_shutdown(struct usb_hcd *hcd) xhci_dbg_trace(xhci, trace_xhci_dbg_init, "xhci_shutdown completed - status = %x", readl(&xhci->op_regs->status)); + + /* TI XHCI controllers do not come back after kexec without this hack */ + pci_reset_function_locked(to_pci_dev(hcd->self.sysdev)); } EXPORT_SYMBOL_GPL(xhci_shutdown); I would like some advice on how to implement it in a way that is acceptable. Would a quirk on the pci id in xhci_shutdown be ok?
Yes, but as this is a pci specific workaround the quirk should go to xhci-pci.c: xhci_pci_shutdown(), which was added in v5.5 Is the rootcause known? Is the only possible solution to reset the pci function?. Have you tried, or seen this issue on any other controller than this TUSB73x0?
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) The full debug log of the distro kernel booting is below. [ 1.037833] xhci_hcd 0003:01:00.0: USBCMD 0x0: [ 1.037835] xhci_hcd 0003:01:00.0: HC is being stopped [ 1.037837] xhci_hcd 0003:01:00.0: HC has finished hard reset [ 1.037839] xhci_hcd 0003:01:00.0: Event Interrupts disabled [ 1.037841] xhci_hcd 0003:01:00.0: Host System Error Interrupts disabled [ 1.037843] xhci_hcd 0003:01:00.0: HC has finished light reset [ 1.037846] xhci_hcd 0003:01:00.0: USBSTS 0x0: [ 1.037847] xhci_hcd 0003:01:00.0: Event ring is empty [ 1.037849] xhci_hcd 0003:01:00.0: No Host System Error [ 1.037851] xhci_hcd 0003:01:00.0: HC is running
Hmm, all bits in both USBCMD and USBSTS are 0. This is a bit suspicious. Normally at least USBCMD Run/Stop bit, and USBSTS HCHalted bit have opposite values. -Mathias