On Thu, 6 Feb 2020 at 14:23, Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote: > > On 6.2.2020 5.37, Joel Stanley wrote: > > On Wed, 5 Feb 2020 at 09:35, Mathias Nyman > > <mathias.nyman@xxxxxxxxxxxxxxx> wrote: > >> > >> On 5.2.2020 2.55, Joel Stanley wrote: > >>> I'm supporting a system that uses Linux-as-a-bootloader to load a > >>> distro kernel via kexec, The systems have a TI TUSB73x0 PCIe > >>> controller which goes out to lunch after a kexec. This is the distro > >>> (post-kexec) kernel: > >>> > >>> [ 0.235411] pci 0003:01:00.0: xHCI HW did not halt within 16000 > >>> usec status = 0x0 > >>> [ 1.037298] xhci_hcd 0003:01:00.0: xHCI Host Controller > >>> [ 1.037367] xhci_hcd 0003:01:00.0: new USB bus registered, assigned > >>> bus number 1 > >>> [ 1.053481] xhci_hcd 0003:01:00.0: Host halt failed, -110 > >>> [ 1.053523] xhci_hcd 0003:01:00.0: can't setup: -110 > >>> [ 1.053565] xhci_hcd 0003:01:00.0: USB bus 1 deregistered > >>> [ 1.053629] xhci_hcd 0003:01:00.0: init 0003:01:00.0 fail, -110 > >>> [ 1.053703] xhci_hcd: probe of 0003:01:00.0 failed with error -110 > >>> > > >>> > >>> 0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB > >>> 3.0 xHCI Host Controller (rev 02) > >>> > >>> The full debug log of the distro kernel booting is below. > >>> > >>> [ 1.037833] xhci_hcd 0003:01:00.0: USBCMD 0x0: > >>> [ 1.037835] xhci_hcd 0003:01:00.0: HC is being stopped > >>> [ 1.037837] xhci_hcd 0003:01:00.0: HC has finished hard reset > >>> [ 1.037839] xhci_hcd 0003:01:00.0: Event Interrupts disabled > >>> [ 1.037841] xhci_hcd 0003:01:00.0: Host System Error Interrupts disabled > >>> [ 1.037843] xhci_hcd 0003:01:00.0: HC has finished light reset > >>> [ 1.037846] xhci_hcd 0003:01:00.0: USBSTS 0x0: > >>> [ 1.037847] xhci_hcd 0003:01:00.0: Event ring is empty > >>> [ 1.037849] xhci_hcd 0003:01:00.0: No Host System Error > >>> [ 1.037851] xhci_hcd 0003:01:00.0: HC is running > >> > >> Hmm, all bits in both USBCMD and USBSTS are 0. This is a bit suspicious. > >> Normally at least USBCMD Run/Stop bit, and USBSTS HCHalted bit have > >> opposite values. > > > > Does this suggest the controller is not responding at all? > > > > The Capability registers looks fine, so does port status registers. > It's just the operational USBSTS and USBCMD registers that return 0. > > Current xhci implementation assumes host failed to halt because USBSTS > HCHalted bit is still 0, and bails out before reset. > Host is probably not running, register just returns all zero. > > Can you try if the below code works, it checks if host is running from > an additional place, and continues with the host reset. Here's the patch applied to 5.6-rc1, and then kexec'd twice (once so we're running a kernel without any workarounds on shutdown, and the second time to test the recovery code). It appears to have made it a bit further: [ 1.532920] pci 0003:01:00.0: enabling device (0140 -> 0142) [ 1.549081] pci 0003:01:00.0: xHCI HW did not halt within 16000 usec status = 0x10 [ 1.549119] pci 0003:01:00.0: quirk_usb_early_handoff+0x0/0x7c4 took 15820 usecs [ 5.494595] xhci_hcd 0003:01:00.0: xHCI Host Controller [ 5.494670] xhci_hcd 0003:01:00.0: new USB bus registered, assigned bus number 1 [ 5.510774] xhci_hcd 0003:01:00.0: Host halt failed, -110 [ 5.510791] xhci_hcd 0003:01:00.0: Continue with reset even if host appears running [ 5.511271] xhci_hcd 0003:01:00.0: hcc params 0x0270f06d hci version 0x96 quirks 0x0000000004000000 [ 5.522063] xhci_hcd 0003:01:00.0: xHCI Host Controller [ 5.522115] xhci_hcd 0003:01:00.0: new USB bus registered, assigned bus number 2 [ 5.522186] xhci_hcd 0003:01:00.0: Host supports USB 3.0 SuperSpeed [ 19.003160] xhci_hcd 0003:01:00.0: Abort failed to stop command ring: -110 [ 19.019167] xhci_hcd 0003:01:00.0: Host halt failed, -110 [ 19.019168] xhci_hcd 0003:01:00.0: xHCI host controller not responding, assume dead [ 19.019172] xhci_hcd 0003:01:00.0: HC died; cleaning up [ 19.019299] xhci_hcd 0003:01:00.0: Error while assigning device slot ID [ 19.019302] xhci_hcd 0003:01:00.0: Max number of devices this xHCI host supports is 64. > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c > index fe38275363e0..2dbfeaf88574 100644 > --- a/drivers/usb/host/xhci.c > +++ b/drivers/usb/host/xhci.c > @@ -177,8 +177,16 @@ int xhci_reset(struct xhci_hcd *xhci) > } > > if ((state & STS_HALT) == 0) { > - xhci_warn(xhci, "Host controller not halted, aborting reset.\n"); > - return 0; > + /* > + * After a kexec TI TUSB73x0 might appear running as its USBSTS > + * and USBCMD registers return all zeroes. Doublecheck if host > + * is running from USBCMD RUN bit before bailing out. > + */ > + command = readl(&xhci->op_regs->command); > + if (command & CMD_RUN) { > + xhci_warn(xhci, "Host controller not halted, aborting reset.\n"); > + return 0; > + } > } > > xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Reset the HC"); > @@ -5217,7 +5225,7 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks) > /* Make sure the HC is halted. */ > retval = xhci_halt(xhci); > if (retval) > - return retval; > + xhci_warn(xhci, "Continue with reset even if host appears running\n"); > > xhci_zero_64b_regs(xhci);