Re: [RFT PATCH] xhci: make xhci_handshake timeout for xhci_reset() adjustable

Pavan Kondeti <quic_pkondeti@xxxxxxxxxxx> · Wed, 2 Mar 2022 13:27:31 +0530

Hi Greg,

On Wed, Mar 02, 2022 at 08:49:08AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Mar 02, 2022 at 08:53:06AM +0530, Pavan Kondeti wrote:
> > On Tue, Mar 01, 2022 at 10:47:36AM +0200, Mathias Nyman wrote:
> > > On 1.3.2022 6.03, Pavan Kondeti wrote:
> > > > On Mon, Feb 28, 2022 at 05:49:49PM -0800, Jack Pham wrote:
> > > >> Hi Mathias,
> > > >>
> > > >> On Thu, Feb 17, 2022 at 03:56:43PM +0200, Mathias Nyman wrote:
> > > >>> xhci_reset() timeout was increased from 250ms to 10 seconds in order to
> > > >>> give Renesas 720201 xHC enough time to get ready in probe.
> > > >>
> > > >> This suggests that the only place we really need the long timeout is
> > > >> in xhci_gen_setup().
> > > >>
> > > >>> @@ -1163,7 +1161,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
> > > >>>  		xhci_dbg(xhci, "Stop HCD\n");
> > > >>>  		xhci_halt(xhci);
> > > >>>  		xhci_zero_64b_regs(xhci);
> > > >>> -		retval = xhci_reset(xhci);
> > > >>> +		retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
> > > >>
> > > >> Since preemption is disabled here (spin_lock_irq() is called near the
> > > >> start of this function), shouldn't we also limit this to 250ms here in
> > > >> xhci_resume() as well?
> > > >>> The rationale of decreasing the timeout to 250 in certain places is based
> > > > on the criticality of the operation but not on the preemption/irq state.
> > > > Since xHC reset is critical in startup and resume, the 10 seconds timeout
> > > > is enforced so that we don't break Renesas 720201 xHC.
> > > > 
> > > > Since all of our internl test reports indicate that the timeout is happening
> > > > from stop hcd, this patch is helping.
> > > > 
> > > 
> > > This was pretty much my reasoning as well.
> > > I could add a comment about this to the commit message
> > > 
> > > In addition we want a targeted fix for a real world issue that we can send to
> > > stable without changing too much, risking regressions.
> > > 
> > Makes complete sense. 
> > 
> > Greg,
> > 
> > Do you plan to include this patch?
> 
> I need it to be forwarded to me by the maintainer with their
> signed-off-by, like any other xhci core patch :)

Got it. I will wait for Mathia's final patch.

> 
> What is the rush here?  Your hardware is obviously broken somehow if you
> are hitting this problem just now.  Why not fix the root cause here,
> this patch should only affect rare situations, right?  Are you having
> this problem a lot?
> 
Sure, we will work with hardware designers to understand the root cause but
it is very unlikely that this can be fixed with a firmware/control setting
change. so this workaround would atleast solve the stability issues. It
happens rarely but we test with lots of devices in our farm and it shows up
eventually on one or the other device in a day or so. Why not include a
harmless patch and improve user experience is what driving us to request
Mathias and you to help.

Thanks,
Pavan