Am Montag, dem 07.03.2022 um 10:49 +0200 schrieb Mathias Nyman: > On 4.3.2022 16.17, Greg KH wrote: > > On Fri, Mar 04, 2022 at 12:30:57PM +0100, Martin Kepplinger wrote: > > > On the Librem 5 imx8mq system we've seen the stop endpoint > > > command > > > time out regularly which results in the hub dying. > > > > > > While on the one hand we see "Port resume timed out, port 1-1: > > > 0xfe3" > > > before this and on the other hand driver-comments suggest that > > > the driver > > > might be able to recover instead of dying here, Sarah seemed to > > > have a > > > workaround for this particulator problem in mind already: > > > > > > Make it a module parameter. So while it might not be the root > > > cause for > > > the problem, do this to give users a workaround. > > > > This is not the 1990's, sorry, please do not add new module > > parameters. > > They modify code, when you want to modify an individual device. > > > > Agree, I think we really need to find the rootcause here. > > There's a known problem with this stop endpoint timeout timer. > > For all other commands we start the timer when the controller starts > processing the > command, but the stop endpoint timer is started immediately when > command is queued. > So it might timeout if some other commend before it failed. > > I have a patchseries for this. It's still work in progress but should > be testable. > Pushed to a branch named stop_endpoint_fixes > > git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git > stop_endpoint_fixes > https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=stop_endpoint_fixes > > Can you try it out and see if it helps? > thanks a lot Mathias, I'm running these now. The timeout has not been easy to reproduce (or I'm just lazy) but in a few days I should be able to tell whether that helps. So this thread has been about [14145.960512] xhci-hcd xhci-hcd.4.auto: Port resume timed out, port 1- 1: 0xfe3 [14156.308511] xhci-hcd xhci-hcd.4.auto: xHCI host not responding to stop endpoint command. that I previously tried to work around by increasing XHCI_MAX_REXIT_TIMEOUT_MS and XHCI_STOP_EP_CMD_TIMEOUT. These patches can't help with the following, right? readl_poll_timeout_atomic() with a fixed timeout is called in this case: xhci-hcd xhci-hcd.4.auto: Abort failed to stop command ring: -110 I see that too from time to time. It results in the HC dying as well. thanks, martin