On 25 March 2018 at 11:37, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > On Fri, 02 Mar 2018 17:38:26 +0000, > Bockholdt Arne wrote: > > Hi Arne, > >> >> On Thu, 2018-03-01 at 17:37 +0000, Marc Zyngier wrote: >> > On 01/03/18 08:01, Bockholdt Arne wrote: >> > > >> > > On Thu, 2018-02-15 at 19:29 +0000, Marc Zyngier wrote: >> > > > [+ Ard, who helped me chasing the initial issue] >> > > > >> > > > On 15/02/18 06:43, Bockholdt Arne wrote: >> > > > > Hi all, >> > > > > >> > > > > on our Intel Atom C2578 server with a SuperMicro A1SAi board >> > > > > and a >> > > > > Renesas uPD720201 USB 3.0 host controller the controller has >> > > > > stopped >> > > > > working since kernel 4.13.x. Before that kernel the dmesg >> > > > > messages >> > > > > from >> > > > > XHCI were: >> > > > > >> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd 0000:03:00.0: xHCI Host >> > > > > Controller >> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd 0000:03:00.0: new USB bus >> > > > > registered, >> > > > > assigned bus number 2 >> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd 0000:03:00.0: hcc params >> > > > > 0x014051cf >> > > > > hci version 0x100 quirks 0x00000010 >> > > > > dmesg-4.12.1-serverv4.log:usb usb2: Manufacturer: Linux 4.12.1- >> > > > > serverv4 >> > > > > xhci-hcd >> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd 0000:03:00.0: xHCI Host >> > > > > Controller >> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd 0000:03:00.0: new USB bus >> > > > > registered, >> > > > > assigned bus number 3 >> > > > > dmesg-4.12.1-serverv4.log:usb usb3: Manufacturer: Linux 4.12.1- >> > > > > serverv4 >> > > > > xhci-hcd >> > > > > >> > > > > After that the message look like that: >> > > > > >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: Resetting >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: Refused to >> > > > > change >> > > > > power >> > > > > state, currently in D3 >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: xHCI Host >> > > > > Controller >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: new USB bus >> > > > > registered, >> > > > > assigned bus number 2 >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: Host halt >> > > > > failed, >> > > > > -19 >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: can't setup: >> > > > > -19 >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: USB bus 2 >> > > > > deregistered >> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd 0000:03:00.0: init >> > > > > 0000:03:00.0 >> > > > > fail, -19 >> > > > > >> > > > > I've tested it with 4.15.3 too, it's still the same. I've >> > > > > narrowed >> > > > > it >> > > > > down to the following patch: >> > > > > >> > > > > commit 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 >> > > > > Author: Marc Zyngier <marc.zyngier@xxxxxxx <mailto:marc.zyngier >> > > > > @arm >> > > > > .com>> >> > > > > Date: Tue Aug 1 20:11:08 2017 -0500 >> > > > > >> > > > > xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA >> > > > > issue >> > > > > >> > > > > Reverting the patch on top of 4.15.3 restores the USB3 >> > > > > functionality on >> > > > > our server. Please let me know if there is anything I can do to >> > > > > fix >> > > > > the >> > > > > problem. Thank you. >> > > > >> > > > Hi Arne, >> > > > >> > > > This looks pretty bad. I suspect that once reset, the firmware is >> > > > lost. >> > > > I'll try to write a patch dumping some information about it. >> > > > >> > > > Ard: Do you know if the Cello board has a SPI flash connected to >> > > > the >> > > > Renesas chip, from which it would load the firmware? >> > > > >> > > > Another possibility is that power management kicks in, and that >> > > > the >> > > > endpoint is stuck there. Could also be firmware related, but not >> > > > only. >> > > > I'd welcome any idea on the subject, as I cannot reproduce this >> > > > issue >> > > > on >> > > > the HW I have. >> > > > >> > > > It we cannot work out what exactly is causing this, we may have >> > > > to >> > > > default to not resetting the part and rely on a command-line >> > > > option >> > > > to >> > > > do it... I can't say I'm a fan. >> > > > >> > > > Thanks, >> > > > >> > > > M. >> > > > >> > > >> > > Hi Marc, >> > > >> > > I've tested it with 4.15.7 and it's still there. Is there anything >> > > that >> > > I can do to help fixing this problem? >> > >> > Would you mind trying the following patch and let me know if it >> > helps? >> > It is not exactly pretty, but we can polish it if that solves your >> > issue. > > [...] > >> I've applied your patch on top of 4.15.7 and tried it on the server, >> unfortunately the problem is still there. Here's the output from dmesg: >> >> [ 1.570115] xhci_hcd 0000:03:00.0: Found a 64bit address in ERSTBA 4 >> [ 1.570120] xhci_hcd 0000:03:00.0: Resetting >> [ 2.668066] xhci_hcd 0000:03:00.0: Refused to change power state, >> currently in D3 >> [ 2.668215] xhci_hcd 0000:03:00.0: xHCI Host Controller >> [ 2.668225] xhci_hcd 0000:03:00.0: new USB bus registered, assigned >> bus number 2 >> [ 2.668240] xhci_hcd 0000:03:00.0: Host halt failed, -19 >> [ 2.668242] xhci_hcd 0000:03:00.0: can't setup: -19 >> [ 2.668299] xhci_hcd 0000:03:00.0: USB bus 2 deregistered >> [ 2.668354] xhci_hcd 0000:03:00.0: init 0000:03:00.0 fail, -19 >> >> If you need more informations to find the cause, I will gladly provide >> it. > > I finally found some time to work on this, and came up with an > alternative approach (it turns out that this chip is even more > braindead than I thought). > > It is slightly scary, in the sense that the USB controller seems to > perform memory accesses even when halted, and can generate faults, > but it works just fine on my system. And with this, we can drop the > hard reset at boot time. I'm still on the fence to limit it to systems > with an iommu though. > Hi Marc, I take it you tested this on Cello? There, it might make sense to limit this to systems with an IOMMU, but not in the general case, I think. The reason is that it is not guaranteed that the firmware will use 32-bit addressable allocations for these data structures, even if the kernel is able to without an IOMMU. (UEFI on arm64 will not prefer 32-bit addressable memory for PCI DMA if it is available, and usually serves heap allocations [such as the ones used for these data structures] starting at the top of DRAM) -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html