RE: [PATCH v4 13/14] usb: force warm reset to break resume livelock

Paul Zimmerman <Paul.Zimmerman@xxxxxxxxxxxx> · Tue, 18 Feb 2014 21:54:59 +0000

> From: Felipe Balbi [mailto:balbi@xxxxxx]
> Sent: Tuesday, February 18, 2014 1:05 PM
> 
> Hi,
> 
> + Paul as he might have better details about the Synopsys core host-side
> implementation
> 
> On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote:
> > On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
> > > >> I believe I am seeing a "polling livelock" scenario as described by Julius.
> > > >
> > > > Julius was talking about what happens when the host controller itself
> > > > gets reset (and therefore remembers nothing about any device) whereas
> > > > the device still thinks it is in U3.  Is that the scenario you're
> > > > encountering?  I thought you were working on normal runtime PM.
> > >
> > > When you turn the power back on for a port, it should start out in
> > > RxDetect and switch to Polling as it detects Rx terminations. If the
> > > downstream device is unhappy for any reason (e.g. in SS.Inactive or
> > > still in U3) and sends no or wrong responses to the LFPS Polling, the
> > > hub's port will either move to ComplianceMode or keep cycling back and
> > > forth between RxDetect and Polling.
> >
> > > The latter is especially dangerous
> > > because it's hard to detect (if you just sample the port status you
> > > might see RxDetect, which would also be expected if there is nothing
> > > connected at all), so I'm thinking an unconditional warm reset might
> > > be unavoidable.
> >
> > > That is why we proposed to go that route for the
> > > Synopsys controller, and I think the same will apply to this situation
> > > (since I think turning off a PortPower bit in XHCI will make the
> > > controller "forget" a previous U3 state and return to RxDetect when
> > > you turn it back on again, even though the actual VBUS line to the
> > > device may not have been disabled after all).
> >
> > Julius, are you sure the Synopsys host will actually power off the
> > ports?  The Intel hosts need some special ACPI methods, so I'm not sure
> > if Dan's issue with ports after power on could even be seen on the
> > Synopsys host.
> >
> > The Synopsys issue (as I remember it, please correct me if I'm wrong)
> > would only be triggered if the host lost power during S3, and was halted
> > and reset after a register restore failure.  I think the solution we
> > agreed to was to implement a Synopsys host quirk that would warm reset
> > all ports unconditionally on register restore failure.  Was that right?
> >
> > Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
> > the host starts to cycle between RxDetect and Polling and U0 for this
> > case?
> >
> > Hans also ran into this issue (or at least a variation of it), and
> > proposed a patch to fix it.
> >
> > https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streams&id=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747
> >
> > Can you take a look at it, and see if it would address your issue?  I
> > think it will catch the case where we transition from SS.Inactive ->
> > RxDetect -> Polling.
> >
> > > >> > One other thought (I don't know if it is the right thing to do) is that
> > > >> > we might _always_ perform a warm reset after powering-on a SuperSpeed
> > > >> > port, without bothering to call hub_port_debounce_be_connected.
> > > >>
> > > >> I'm leaning in that direction.  However, the decision comes down to
> > > >> the relative occurrence frequency of devices that fall into this trap
> > > >> vs those that successfully recover and would suffer the additional
> > > >> latency of a warm reset.
> > > >
> > > > Is a warm reset significantly longer than an ordinary reset?  We have
> > > > to do some kind of reset in any case.  After all, the power session
> > > > _has_ been interrupted.  (Assuming the power switching worked...)
> > >
> > > USB 3.0 ports don't need to be reset on connect as a matter of course.
> > > The should usually just start training themselves and eventually
> > > become ready as soon as the wires touch. An extra warm reset would add
> > > 80-120ms delay to the port resume. (In comparison, a hot reset should
> > > not take more than 12ms, probably even much less.)
> >
> > I would rather avoid warm reset unconditionally on connect whenever
> > possible, because 80-120ms is too long of a delay for some
> > embedded/tablet systems that come into and out of S3 very often.
> >
> > > >> >> With this in place we may want to consider reducing the timeout and
> > > >> >> relying on warm reset for recovery.
> > > >> >
> > > >> > Why?  I'm not familiar with the intricacies of USB-3 link state
> > > >> > changes, but there seem to be only two possibilities:
> > > >> >
> > > >> >         Either PORT_LS_POLLING is a valid state to be in while
> > > >> >         trying to establish a SuperSpeed connection, in which case
> > > >> >         we don't want to reduce the timeout,
> > > >> >
> > > >> >         Or it isn't a valid state, in which case we should abort
> > > >> >         the debounce immediately.
> > >
> > > It is a valid transitional state, unfortunately, but in a working case
> > > it should resolve itself within a few milliseconds (probably less than
> > > 10). Maybe we should try to differentiate between USB 2.0 and 3.0
> > > devices in hub_port_debounce()? I think due to the built-in link
> > > training in USB 3.0, the classic debouncing doesn't really make sense
> > > anymore (and wastes a lot of time since SuperSpeed links can train
> > > really fast when they work).
> > >
> > > As for this patch, I think the best approach would be to wait for the
> > > device to come back in usb_port_runtime_resume() (through
> > > hub_port_debounce() or something else), and if it doesn't show up
> > > always set the bit to warm reset the port (regardless of LTSSM state,
> > > since even if it says RxDetect I wouldn't be sure that there is really
> > > nothing connected). We could then also use those bits in the "lost
> > > power" case of xhci_resume() to try and work around the problems with
> > > that Synopsys controller.
> >
> > That's a lot of changes to the hub core.  Would an xHCI quirk be
> > simpler?  Is there some scenario I'm missing that the S3 resume quirk
> > wouldn't handle?

Can someone give me a recipe for reproducing the suspected issue? I am
unable to follow all the twists and turns of these email threads.

I see someone mentioned the ax88179_178a net adapter. I have one of
those, so I should be able to reproduce the issue here if I know
exactly how.

If I am able to reproduce the issue here, I can discuss it with the RTL
designers and see what they have to say.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html