On Tue, Feb 18, 2014 at 09:54:59PM +0000, Paul Zimmerman wrote: > > From: Felipe Balbi [mailto:balbi@xxxxxx] > > Sent: Tuesday, February 18, 2014 1:05 PM > > > > Hi, > > > > + Paul as he might have better details about the Synopsys core host-side > > implementation > > > > On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote: > > > On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote: > > > > >> I believe I am seeing a "polling livelock" scenario as described by Julius. > > > > > > > > > > Julius was talking about what happens when the host controller itself > > > > > gets reset (and therefore remembers nothing about any device) whereas > > > > > the device still thinks it is in U3. Is that the scenario you're > > > > > encountering? I thought you were working on normal runtime PM. > > > > > > > > When you turn the power back on for a port, it should start out in > > > > RxDetect and switch to Polling as it detects Rx terminations. If the > > > > downstream device is unhappy for any reason (e.g. in SS.Inactive or > > > > still in U3) and sends no or wrong responses to the LFPS Polling, the > > > > hub's port will either move to ComplianceMode or keep cycling back and > > > > forth between RxDetect and Polling. > > > > > > > The latter is especially dangerous > > > > because it's hard to detect (if you just sample the port status you > > > > might see RxDetect, which would also be expected if there is nothing > > > > connected at all), so I'm thinking an unconditional warm reset might > > > > be unavoidable. > > > > > > > That is why we proposed to go that route for the > > > > Synopsys controller, and I think the same will apply to this situation > > > > (since I think turning off a PortPower bit in XHCI will make the > > > > controller "forget" a previous U3 state and return to RxDetect when > > > > you turn it back on again, even though the actual VBUS line to the > > > > device may not have been disabled after all). > > > > > > Julius, are you sure the Synopsys host will actually power off the > > > ports? The Intel hosts need some special ACPI methods, so I'm not sure > > > if Dan's issue with ports after power on could even be seen on the > > > Synopsys host. > > > > > > The Synopsys issue (as I remember it, please correct me if I'm wrong) > > > would only be triggered if the host lost power during S3, and was halted > > > and reset after a register restore failure. I think the solution we > > > agreed to was to implement a Synopsys host quirk that would warm reset > > > all ports unconditionally on register restore failure. Was that right? > > > > > > Then there's Dan's issue. Dan, does the port go into SS.Inactive before > > > the host starts to cycle between RxDetect and Polling and U0 for this > > > case? > > > > > > Hans also ran into this issue (or at least a variation of it), and > > > proposed a patch to fix it. > > > > > > https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streams&id=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 > > > > > > Can you take a look at it, and see if it would address your issue? I > > > think it will catch the case where we transition from SS.Inactive -> > > > RxDetect -> Polling. > > > > > > > >> > One other thought (I don't know if it is the right thing to do) is that > > > > >> > we might _always_ perform a warm reset after powering-on a SuperSpeed > > > > >> > port, without bothering to call hub_port_debounce_be_connected. > > > > >> > > > > >> I'm leaning in that direction. However, the decision comes down to > > > > >> the relative occurrence frequency of devices that fall into this trap > > > > >> vs those that successfully recover and would suffer the additional > > > > >> latency of a warm reset. > > > > > > > > > > Is a warm reset significantly longer than an ordinary reset? We have > > > > > to do some kind of reset in any case. After all, the power session > > > > > _has_ been interrupted. (Assuming the power switching worked...) > > > > > > > > USB 3.0 ports don't need to be reset on connect as a matter of course. > > > > The should usually just start training themselves and eventually > > > > become ready as soon as the wires touch. An extra warm reset would add > > > > 80-120ms delay to the port resume. (In comparison, a hot reset should > > > > not take more than 12ms, probably even much less.) > > > > > > I would rather avoid warm reset unconditionally on connect whenever > > > possible, because 80-120ms is too long of a delay for some > > > embedded/tablet systems that come into and out of S3 very often. > > > > > > > >> >> With this in place we may want to consider reducing the timeout and > > > > >> >> relying on warm reset for recovery. > > > > >> > > > > > >> > Why? I'm not familiar with the intricacies of USB-3 link state > > > > >> > changes, but there seem to be only two possibilities: > > > > >> > > > > > >> > Either PORT_LS_POLLING is a valid state to be in while > > > > >> > trying to establish a SuperSpeed connection, in which case > > > > >> > we don't want to reduce the timeout, > > > > >> > > > > > >> > Or it isn't a valid state, in which case we should abort > > > > >> > the debounce immediately. > > > > > > > > It is a valid transitional state, unfortunately, but in a working case > > > > it should resolve itself within a few milliseconds (probably less than > > > > 10). Maybe we should try to differentiate between USB 2.0 and 3.0 > > > > devices in hub_port_debounce()? I think due to the built-in link > > > > training in USB 3.0, the classic debouncing doesn't really make sense > > > > anymore (and wastes a lot of time since SuperSpeed links can train > > > > really fast when they work). > > > > > > > > As for this patch, I think the best approach would be to wait for the > > > > device to come back in usb_port_runtime_resume() (through > > > > hub_port_debounce() or something else), and if it doesn't show up > > > > always set the bit to warm reset the port (regardless of LTSSM state, > > > > since even if it says RxDetect I wouldn't be sure that there is really > > > > nothing connected). We could then also use those bits in the "lost > > > > power" case of xhci_resume() to try and work around the problems with > > > > that Synopsys controller. > > > > > > That's a lot of changes to the hub core. Would an xHCI quirk be > > > simpler? Is there some scenario I'm missing that the S3 resume quirk > > > wouldn't handle? > > Can someone give me a recipe for reproducing the suspected issue? I am > unable to follow all the twists and turns of these email threads. > > I see someone mentioned the ax88179_178a net adapter. I have one of > those, so I should be able to reproduce the issue here if I know > exactly how. > > If I am able to reproduce the issue here, I can discuss it with the RTL > designers and see what they have to say. Thanks Paul, I have one of such devices here too and can try to reproduce on OMAP5 and AM437x given instructions. cheers -- balbi
Attachment:
signature.asc
Description: Digital signature