Hi, + Paul as he might have better details about the Synopsys core host-side implementation On Tue, Feb 18, 2014 at 12:42:53PM -0800, Sarah Sharp wrote: > On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote: > > >> I believe I am seeing a "polling livelock" scenario as described by Julius. > > > > > > Julius was talking about what happens when the host controller itself > > > gets reset (and therefore remembers nothing about any device) whereas > > > the device still thinks it is in U3. Is that the scenario you're > > > encountering? I thought you were working on normal runtime PM. > > > > When you turn the power back on for a port, it should start out in > > RxDetect and switch to Polling as it detects Rx terminations. If the > > downstream device is unhappy for any reason (e.g. in SS.Inactive or > > still in U3) and sends no or wrong responses to the LFPS Polling, the > > hub's port will either move to ComplianceMode or keep cycling back and > > forth between RxDetect and Polling. > > > The latter is especially dangerous > > because it's hard to detect (if you just sample the port status you > > might see RxDetect, which would also be expected if there is nothing > > connected at all), so I'm thinking an unconditional warm reset might > > be unavoidable. > > > That is why we proposed to go that route for the > > Synopsys controller, and I think the same will apply to this situation > > (since I think turning off a PortPower bit in XHCI will make the > > controller "forget" a previous U3 state and return to RxDetect when > > you turn it back on again, even though the actual VBUS line to the > > device may not have been disabled after all). > > Julius, are you sure the Synopsys host will actually power off the > ports? The Intel hosts need some special ACPI methods, so I'm not sure > if Dan's issue with ports after power on could even be seen on the > Synopsys host. > > The Synopsys issue (as I remember it, please correct me if I'm wrong) > would only be triggered if the host lost power during S3, and was halted > and reset after a register restore failure. I think the solution we > agreed to was to implement a Synopsys host quirk that would warm reset > all ports unconditionally on register restore failure. Was that right? > > Then there's Dan's issue. Dan, does the port go into SS.Inactive before > the host starts to cycle between RxDetect and Polling and U0 for this > case? > > Hans also ran into this issue (or at least a variation of it), and > proposed a patch to fix it. > > https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streams&id=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 > > Can you take a look at it, and see if it would address your issue? I > think it will catch the case where we transition from SS.Inactive -> > RxDetect -> Polling. > > > >> > One other thought (I don't know if it is the right thing to do) is that > > >> > we might _always_ perform a warm reset after powering-on a SuperSpeed > > >> > port, without bothering to call hub_port_debounce_be_connected. > > >> > > >> I'm leaning in that direction. However, the decision comes down to > > >> the relative occurrence frequency of devices that fall into this trap > > >> vs those that successfully recover and would suffer the additional > > >> latency of a warm reset. > > > > > > Is a warm reset significantly longer than an ordinary reset? We have > > > to do some kind of reset in any case. After all, the power session > > > _has_ been interrupted. (Assuming the power switching worked...) > > > > USB 3.0 ports don't need to be reset on connect as a matter of course. > > The should usually just start training themselves and eventually > > become ready as soon as the wires touch. An extra warm reset would add > > 80-120ms delay to the port resume. (In comparison, a hot reset should > > not take more than 12ms, probably even much less.) > > I would rather avoid warm reset unconditionally on connect whenever > possible, because 80-120ms is too long of a delay for some > embedded/tablet systems that come into and out of S3 very often. > > > >> >> With this in place we may want to consider reducing the timeout and > > >> >> relying on warm reset for recovery. > > >> > > > >> > Why? I'm not familiar with the intricacies of USB-3 link state > > >> > changes, but there seem to be only two possibilities: > > >> > > > >> > Either PORT_LS_POLLING is a valid state to be in while > > >> > trying to establish a SuperSpeed connection, in which case > > >> > we don't want to reduce the timeout, > > >> > > > >> > Or it isn't a valid state, in which case we should abort > > >> > the debounce immediately. > > > > It is a valid transitional state, unfortunately, but in a working case > > it should resolve itself within a few milliseconds (probably less than > > 10). Maybe we should try to differentiate between USB 2.0 and 3.0 > > devices in hub_port_debounce()? I think due to the built-in link > > training in USB 3.0, the classic debouncing doesn't really make sense > > anymore (and wastes a lot of time since SuperSpeed links can train > > really fast when they work). > > > > As for this patch, I think the best approach would be to wait for the > > device to come back in usb_port_runtime_resume() (through > > hub_port_debounce() or something else), and if it doesn't show up > > always set the bit to warm reset the port (regardless of LTSSM state, > > since even if it says RxDetect I wouldn't be sure that there is really > > nothing connected). We could then also use those bits in the "lost > > power" case of xhci_resume() to try and work around the problems with > > that Synopsys controller. > > That's a lot of changes to the hub core. Would an xHCI quirk be > simpler? Is there some scenario I'm missing that the S3 resume quirk > wouldn't handle? > > Sarah Sharp -- balbi
Attachment:
signature.asc
Description: Digital signature