Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 18, 2014 at 09:54:59PM +0000, Paul Zimmerman wrote:
> > From: Felipe Balbi [mailto:balbi@xxxxxx]
> > Sent: Tuesday, February 18, 2014 1:05 PM
> > 
> > Hi,
> > 
> > + Paul as he might have better details about the Synopsys core host-side
> > implementation
> > 
> > On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote:
> > > On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
> > > > >> I believe I am seeing a "polling livelock" scenario as described by Julius.
> > > > >
> > > > > Julius was talking about what happens when the host controller itself
> > > > > gets reset (and therefore remembers nothing about any device) whereas
> > > > > the device still thinks it is in U3.  Is that the scenario you're
> > > > > encountering?  I thought you were working on normal runtime PM.
> > > >
> > > > When you turn the power back on for a port, it should start out in
> > > > RxDetect and switch to Polling as it detects Rx terminations. If the
> > > > downstream device is unhappy for any reason (e.g. in SS.Inactive or
> > > > still in U3) and sends no or wrong responses to the LFPS Polling, the
> > > > hub's port will either move to ComplianceMode or keep cycling back and
> > > > forth between RxDetect and Polling.
> > >
> > > > The latter is especially dangerous
> > > > because it's hard to detect (if you just sample the port status you
> > > > might see RxDetect, which would also be expected if there is nothing
> > > > connected at all), so I'm thinking an unconditional warm reset might
> > > > be unavoidable.
> > >
> > > > That is why we proposed to go that route for the
> > > > Synopsys controller, and I think the same will apply to this situation
> > > > (since I think turning off a PortPower bit in XHCI will make the
> > > > controller "forget" a previous U3 state and return to RxDetect when
> > > > you turn it back on again, even though the actual VBUS line to the
> > > > device may not have been disabled after all).
> > >
> > > Julius, are you sure the Synopsys host will actually power off the
> > > ports?  The Intel hosts need some special ACPI methods, so I'm not sure
> > > if Dan's issue with ports after power on could even be seen on the
> > > Synopsys host.
> > >
> > > The Synopsys issue (as I remember it, please correct me if I'm wrong)
> > > would only be triggered if the host lost power during S3, and was halted
> > > and reset after a register restore failure.  I think the solution we
> > > agreed to was to implement a Synopsys host quirk that would warm reset
> > > all ports unconditionally on register restore failure.  Was that right?
> > >
> > > Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
> > > the host starts to cycle between RxDetect and Polling and U0 for this
> > > case?
> > >
> > > Hans also ran into this issue (or at least a variation of it), and
> > > proposed a patch to fix it.
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streams&id=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747
> > >
> > > Can you take a look at it, and see if it would address your issue?  I
> > > think it will catch the case where we transition from SS.Inactive ->
> > > RxDetect -> Polling.
> > >
> > > > >> > One other thought (I don't know if it is the right thing to do) is that
> > > > >> > we might _always_ perform a warm reset after powering-on a SuperSpeed
> > > > >> > port, without bothering to call hub_port_debounce_be_connected.
> > > > >>
> > > > >> I'm leaning in that direction.  However, the decision comes down to
> > > > >> the relative occurrence frequency of devices that fall into this trap
> > > > >> vs those that successfully recover and would suffer the additional
> > > > >> latency of a warm reset.
> > > > >
> > > > > Is a warm reset significantly longer than an ordinary reset?  We have
> > > > > to do some kind of reset in any case.  After all, the power session
> > > > > _has_ been interrupted.  (Assuming the power switching worked...)
> > > >
> > > > USB 3.0 ports don't need to be reset on connect as a matter of course.
> > > > The should usually just start training themselves and eventually
> > > > become ready as soon as the wires touch. An extra warm reset would add
> > > > 80-120ms delay to the port resume. (In comparison, a hot reset should
> > > > not take more than 12ms, probably even much less.)
> > >
> > > I would rather avoid warm reset unconditionally on connect whenever
> > > possible, because 80-120ms is too long of a delay for some
> > > embedded/tablet systems that come into and out of S3 very often.
> > >
> > > > >> >> With this in place we may want to consider reducing the timeout and
> > > > >> >> relying on warm reset for recovery.
> > > > >> >
> > > > >> > Why?  I'm not familiar with the intricacies of USB-3 link state
> > > > >> > changes, but there seem to be only two possibilities:
> > > > >> >
> > > > >> >         Either PORT_LS_POLLING is a valid state to be in while
> > > > >> >         trying to establish a SuperSpeed connection, in which case
> > > > >> >         we don't want to reduce the timeout,
> > > > >> >
> > > > >> >         Or it isn't a valid state, in which case we should abort
> > > > >> >         the debounce immediately.
> > > >
> > > > It is a valid transitional state, unfortunately, but in a working case
> > > > it should resolve itself within a few milliseconds (probably less than
> > > > 10). Maybe we should try to differentiate between USB 2.0 and 3.0
> > > > devices in hub_port_debounce()? I think due to the built-in link
> > > > training in USB 3.0, the classic debouncing doesn't really make sense
> > > > anymore (and wastes a lot of time since SuperSpeed links can train
> > > > really fast when they work).
> > > >
> > > > As for this patch, I think the best approach would be to wait for the
> > > > device to come back in usb_port_runtime_resume() (through
> > > > hub_port_debounce() or something else), and if it doesn't show up
> > > > always set the bit to warm reset the port (regardless of LTSSM state,
> > > > since even if it says RxDetect I wouldn't be sure that there is really
> > > > nothing connected). We could then also use those bits in the "lost
> > > > power" case of xhci_resume() to try and work around the problems with
> > > > that Synopsys controller.
> > >
> > > That's a lot of changes to the hub core.  Would an xHCI quirk be
> > > simpler?  Is there some scenario I'm missing that the S3 resume quirk
> > > wouldn't handle?
> 
> Can someone give me a recipe for reproducing the suspected issue? I am
> unable to follow all the twists and turns of these email threads.
> 
> I see someone mentioned the ax88179_178a net adapter. I have one of
> those, so I should be able to reproduce the issue here if I know
> exactly how.
> 
> If I am able to reproduce the issue here, I can discuss it with the RTL
> designers and see what they have to say.

Thanks Paul, I have one of such devices here too and can try to
reproduce on OMAP5 and AM437x given instructions.

cheers

-- 
balbi

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux