On Wed, 5 Feb 2020 at 09:35, Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote: > > On 5.2.2020 2.55, Joel Stanley wrote: > > I'm supporting a system that uses Linux-as-a-bootloader to load a > > distro kernel via kexec, The systems have a TI TUSB73x0 PCIe > > controller which goes out to lunch after a kexec. This is the distro > > (post-kexec) kernel: > > > > [ 0.235411] pci 0003:01:00.0: xHCI HW did not halt within 16000 > > usec status = 0x0 > > [ 1.037298] xhci_hcd 0003:01:00.0: xHCI Host Controller > > [ 1.037367] xhci_hcd 0003:01:00.0: new USB bus registered, assigned > > bus number 1 > > [ 1.053481] xhci_hcd 0003:01:00.0: Host halt failed, -110 > > [ 1.053523] xhci_hcd 0003:01:00.0: can't setup: -110 > > [ 1.053565] xhci_hcd 0003:01:00.0: USB bus 1 deregistered > > [ 1.053629] xhci_hcd 0003:01:00.0: init 0003:01:00.0 fail, -110 > > [ 1.053703] xhci_hcd: probe of 0003:01:00.0 failed with error -110 > > > > There were some fixes made a few years back to improve the situation, > > but we've still had to carry some form of the patch below in the > > bootloader kernel. I would like to rework it so it can be merged. > > > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c > > index dbac0fa9748d..eaa94456dd9d 100644 > > --- a/drivers/usb/host/xhci.c > > +++ b/drivers/usb/host/xhci.c > > @@ -789,6 +789,9 @@ void xhci_shutdown(struct usb_hcd *hcd) > > xhci_dbg_trace(xhci, trace_xhci_dbg_init, > > "xhci_shutdown completed - status = %x", > > readl(&xhci->op_regs->status)); > > + > > + /* TI XHCI controllers do not come back after kexec without this hack */ > > + pci_reset_function_locked(to_pci_dev(hcd->self.sysdev)); > > } > > EXPORT_SYMBOL_GPL(xhci_shutdown); > > > > I would like some advice on how to implement it in a way that is > > acceptable. Would a quirk on the pci id in xhci_shutdown be ok? > > Yes, but as this is a pci specific workaround the quirk should go to > xhci-pci.c: xhci_pci_shutdown(), which was added in v5.5 > > Is the rootcause known? > Is the only possible solution to reset the pci function?. I don't know the root cause. The people that helped debug it in the first place have moved on. > Have you tried, or seen this issue on any other controller than this TUSB73x0? We don't have any systems with a different USB controller. In general, the other PCie devices in the system are well (enough) behaved to survive kexec. We don't have any other out of tree workarounds. > > > > > 0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB > > 3.0 xHCI Host Controller (rev 02) > > > > The full debug log of the distro kernel booting is below. > > > > [ 1.037833] xhci_hcd 0003:01:00.0: USBCMD 0x0: > > [ 1.037835] xhci_hcd 0003:01:00.0: HC is being stopped > > [ 1.037837] xhci_hcd 0003:01:00.0: HC has finished hard reset > > [ 1.037839] xhci_hcd 0003:01:00.0: Event Interrupts disabled > > [ 1.037841] xhci_hcd 0003:01:00.0: Host System Error Interrupts disabled > > [ 1.037843] xhci_hcd 0003:01:00.0: HC has finished light reset > > [ 1.037846] xhci_hcd 0003:01:00.0: USBSTS 0x0: > > [ 1.037847] xhci_hcd 0003:01:00.0: Event ring is empty > > [ 1.037849] xhci_hcd 0003:01:00.0: No Host System Error > > [ 1.037851] xhci_hcd 0003:01:00.0: HC is running > > Hmm, all bits in both USBCMD and USBSTS are 0. This is a bit suspicious. > Normally at least USBCMD Run/Stop bit, and USBSTS HCHalted bit have > opposite values. Does this suggest the controller is not responding at all?