Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

Felipe Balbi <balbi@xxxxxx> · Tue, 18 Feb 2014 15:05:17 -0600



Hi,

+ Paul as he might have better details about the Synopsys core host-side
implementation

On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote:
> On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote:
> > >> I believe I am seeing a "polling livelock" scenario as described by Julius.
> > >
> > > Julius was talking about what happens when the host controller itself
> > > gets reset (and therefore remembers nothing about any device) whereas
> > > the device still thinks it is in U3.  Is that the scenario you're
> > > encountering?  I thought you were working on normal runtime PM.
> > 
> > When you turn the power back on for a port, it should start out in
> > RxDetect and switch to Polling as it detects Rx terminations. If the
> > downstream device is unhappy for any reason (e.g. in SS.Inactive or
> > still in U3) and sends no or wrong responses to the LFPS Polling, the
> > hub's port will either move to ComplianceMode or keep cycling back and
> > forth between RxDetect and Polling.
> 
> > The latter is especially dangerous
> > because it's hard to detect (if you just sample the port status you
> > might see RxDetect, which would also be expected if there is nothing
> > connected at all), so I'm thinking an unconditional warm reset might
> > be unavoidable.
> 
> > That is why we proposed to go that route for the
> > Synopsys controller, and I think the same will apply to this situation
> > (since I think turning off a PortPower bit in XHCI will make the
> > controller "forget" a previous U3 state and return to RxDetect when
> > you turn it back on again, even though the actual VBUS line to the
> > device may not have been disabled after all).
> 
> Julius, are you sure the Synopsys host will actually power off the
> ports?  The Intel hosts need some special ACPI methods, so I'm not sure
> if Dan's issue with ports after power on could even be seen on the
> Synopsys host.
> 
> The Synopsys issue (as I remember it, please correct me if I'm wrong)
> would only be triggered if the host lost power during S3, and was halted
> and reset after a register restore failure.  I think the solution we
> agreed to was to implement a Synopsys host quirk that would warm reset
> all ports unconditionally on register restore failure.  Was that right?
> 
> Then there's Dan's issue.  Dan, does the port go into SS.Inactive before
> the host starts to cycle between RxDetect and Polling and U0 for this
> case?
> 
> Hans also ran into this issue (or at least a variation of it), and
> proposed a patch to fix it.
> 
> https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streams&id=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747
> 
> Can you take a look at it, and see if it would address your issue?  I
> think it will catch the case where we transition from SS.Inactive ->
> RxDetect -> Polling.
> 
> > >> > One other thought (I don't know if it is the right thing to do) is that
> > >> > we might _always_ perform a warm reset after powering-on a SuperSpeed
> > >> > port, without bothering to call hub_port_debounce_be_connected.
> > >>
> > >> I'm leaning in that direction.  However, the decision comes down to
> > >> the relative occurrence frequency of devices that fall into this trap
> > >> vs those that successfully recover and would suffer the additional
> > >> latency of a warm reset.
> > >
> > > Is a warm reset significantly longer than an ordinary reset?  We have
> > > to do some kind of reset in any case.  After all, the power session
> > > _has_ been interrupted.  (Assuming the power switching worked...)
> > 
> > USB 3.0 ports don't need to be reset on connect as a matter of course.
> > The should usually just start training themselves and eventually
> > become ready as soon as the wires touch. An extra warm reset would add
> > 80-120ms delay to the port resume. (In comparison, a hot reset should
> > not take more than 12ms, probably even much less.)
> 
> I would rather avoid warm reset unconditionally on connect whenever
> possible, because 80-120ms is too long of a delay for some
> embedded/tablet systems that come into and out of S3 very often.
> 
> > >> >> With this in place we may want to consider reducing the timeout and
> > >> >> relying on warm reset for recovery.
> > >> >
> > >> > Why?  I'm not familiar with the intricacies of USB-3 link state
> > >> > changes, but there seem to be only two possibilities:
> > >> >
> > >> >         Either PORT_LS_POLLING is a valid state to be in while
> > >> >         trying to establish a SuperSpeed connection, in which case
> > >> >         we don't want to reduce the timeout,
> > >> >
> > >> >         Or it isn't a valid state, in which case we should abort
> > >> >         the debounce immediately.
> > 
> > It is a valid transitional state, unfortunately, but in a working case
> > it should resolve itself within a few milliseconds (probably less than
> > 10). Maybe we should try to differentiate between USB 2.0 and 3.0
> > devices in hub_port_debounce()? I think due to the built-in link
> > training in USB 3.0, the classic debouncing doesn't really make sense
> > anymore (and wastes a lot of time since SuperSpeed links can train
> > really fast when they work).
> > 
> > As for this patch, I think the best approach would be to wait for the
> > device to come back in usb_port_runtime_resume() (through
> > hub_port_debounce() or something else), and if it doesn't show up
> > always set the bit to warm reset the port (regardless of LTSSM state,
> > since even if it says RxDetect I wouldn't be sure that there is really
> > nothing connected). We could then also use those bits in the "lost
> > power" case of xhci_resume() to try and work around the problems with
> > that Synopsys controller.
> 
> That's a lot of changes to the hub core.  Would an xHCI quirk be
> simpler?  Is there some scenario I'm missing that the S3 resume quirk
> wouldn't handle?
> 
> Sarah Sharp

-- 
balbi
Attachment:
signature.asc

Description: Digital signature