On Tue, Feb 11, 2014 at 07:29:40PM -0800, Julius Werner wrote: > >> I believe I am seeing a "polling livelock" scenario as described by Julius. > > > > Julius was talking about what happens when the host controller itself > > gets reset (and therefore remembers nothing about any device) whereas > > the device still thinks it is in U3. Is that the scenario you're > > encountering? I thought you were working on normal runtime PM. > > When you turn the power back on for a port, it should start out in > RxDetect and switch to Polling as it detects Rx terminations. If the > downstream device is unhappy for any reason (e.g. in SS.Inactive or > still in U3) and sends no or wrong responses to the LFPS Polling, the > hub's port will either move to ComplianceMode or keep cycling back and > forth between RxDetect and Polling. > The latter is especially dangerous > because it's hard to detect (if you just sample the port status you > might see RxDetect, which would also be expected if there is nothing > connected at all), so I'm thinking an unconditional warm reset might > be unavoidable. > That is why we proposed to go that route for the > Synopsys controller, and I think the same will apply to this situation > (since I think turning off a PortPower bit in XHCI will make the > controller "forget" a previous U3 state and return to RxDetect when > you turn it back on again, even though the actual VBUS line to the > device may not have been disabled after all). Julius, are you sure the Synopsys host will actually power off the ports? The Intel hosts need some special ACPI methods, so I'm not sure if Dan's issue with ports after power on could even be seen on the Synopsys host. The Synopsys issue (as I remember it, please correct me if I'm wrong) would only be triggered if the host lost power during S3, and was halted and reset after a register restore failure. I think the solution we agreed to was to implement a Synopsys host quirk that would warm reset all ports unconditionally on register restore failure. Was that right? Then there's Dan's issue. Dan, does the port go into SS.Inactive before the host starts to cycle between RxDetect and Polling and U0 for this case? Hans also ran into this issue (or at least a variation of it), and proposed a patch to fix it. https://git.kernel.org/cgit/linux/kernel/git/sarah/xhci.git/commit/?h=for-usb-next-streams&id=3fd14185404e3a298a3f6b6c6f21ff8d41bb2747 Can you take a look at it, and see if it would address your issue? I think it will catch the case where we transition from SS.Inactive -> RxDetect -> Polling. > >> > One other thought (I don't know if it is the right thing to do) is that > >> > we might _always_ perform a warm reset after powering-on a SuperSpeed > >> > port, without bothering to call hub_port_debounce_be_connected. > >> > >> I'm leaning in that direction. However, the decision comes down to > >> the relative occurrence frequency of devices that fall into this trap > >> vs those that successfully recover and would suffer the additional > >> latency of a warm reset. > > > > Is a warm reset significantly longer than an ordinary reset? We have > > to do some kind of reset in any case. After all, the power session > > _has_ been interrupted. (Assuming the power switching worked...) > > USB 3.0 ports don't need to be reset on connect as a matter of course. > The should usually just start training themselves and eventually > become ready as soon as the wires touch. An extra warm reset would add > 80-120ms delay to the port resume. (In comparison, a hot reset should > not take more than 12ms, probably even much less.) I would rather avoid warm reset unconditionally on connect whenever possible, because 80-120ms is too long of a delay for some embedded/tablet systems that come into and out of S3 very often. > >> >> With this in place we may want to consider reducing the timeout and > >> >> relying on warm reset for recovery. > >> > > >> > Why? I'm not familiar with the intricacies of USB-3 link state > >> > changes, but there seem to be only two possibilities: > >> > > >> > Either PORT_LS_POLLING is a valid state to be in while > >> > trying to establish a SuperSpeed connection, in which case > >> > we don't want to reduce the timeout, > >> > > >> > Or it isn't a valid state, in which case we should abort > >> > the debounce immediately. > > It is a valid transitional state, unfortunately, but in a working case > it should resolve itself within a few milliseconds (probably less than > 10). Maybe we should try to differentiate between USB 2.0 and 3.0 > devices in hub_port_debounce()? I think due to the built-in link > training in USB 3.0, the classic debouncing doesn't really make sense > anymore (and wastes a lot of time since SuperSpeed links can train > really fast when they work). > > As for this patch, I think the best approach would be to wait for the > device to come back in usb_port_runtime_resume() (through > hub_port_debounce() or something else), and if it doesn't show up > always set the bit to warm reset the port (regardless of LTSSM state, > since even if it says RxDetect I wouldn't be sure that there is really > nothing connected). We could then also use those bits in the "lost > power" case of xhci_resume() to try and work around the problems with > that Synopsys controller. That's a lot of changes to the hub core. Would an xHCI quirk be simpler? Is there some scenario I'm missing that the S3 resume quirk wouldn't handle? Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html