Re: [PATCH v4 13/14] usb: force warm reset to break resume livelock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 10, 2014 at 1:26 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, 31 Jan 2014, Dan Williams wrote:
>
>> Resuming a powered down port sometimes results in the port state being
>> stuck in USB_SS_PORT_LS_POLLING:
>>
>>  hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0
>>  port1: can't get reconnection after setting port  power on, status -110
>>  hub 3-0:1.0: port 1 status 0000.02e0 after resume, -19
>>  usb 3-1: can't resume, status -19
>>  hub 3-0:1.0: logical disconnect on port 1
>
> It's not obvious that this illustrates your point.  Are we supposed to
> know offhand that 0x2e0 means USB_SS_PORT_LS_POLLING?

Hmm, no, I'll make the clearer in the revised change log.

>
>> In the case above we wait 2 seconds for the port to reconnect and for
>> that entire time the port remained in the polling state.  A warm reset
>> triggers the device to reconnect and resume as normal.  With this patch
>> we get:
>>
>>  hub 3-0:1.0: debounce: port 1: total 2000ms stable 0ms status 0x2e0
>>  usb usb3: port1 usb_port_runtime_resume requires warm reset
>>  hub 3-0:1.0: port 1 not warm reset yet, waiting 50ms
>>  usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd
>
> Could this be improved?  We still spent 2 seconds waiting for a port
> that remained in the polling state.

So this and at least one other question is why this cc list includes
the participants from the thread "[PATCH] USB: core: Add warm reset
while reset-resuming SuperSpeed HUBs":
http://marc.info/?l=linux-usb&m=138678842000703&w=2

I believe I am seeing a "polling livelock" scenario as described by Julius.

>
>> With this in place we may want to consider reducing the timeout and
>> relying on warm reset for recovery.
>
> Why?  I'm not familiar with the intricacies of USB-3 link state
> changes, but there seem to be only two possibilities:
>
>         Either PORT_LS_POLLING is a valid state to be in while
>         trying to establish a SuperSpeed connection, in which case
>         we don't want to reduce the timeout,
>
>         Or it isn't a valid state, in which case we should abort
>         the debounce immediately.
>
> One other thought (I don't know if it is the right thing to do) is that
> we might _always_ perform a warm reset after powering-on a SuperSpeed
> port, without bothering to call hub_port_debounce_be_connected.

I'm leaning in that direction.  However, the decision comes down to
the relative occurrence frequency of devices that fall into this trap
vs those that successfully recover and would suffer the additional
latency of a warm reset.  There's just no way to know if the device on
the other side is legitimately causing a polling condition or whether
this is a result of the aforementioned live lock.  So far I only have
one USB3 device that requires this, our favorite ax88179_178a net
adapter.

The spec says that the only way to reliably sync the state machines is
to remove power from the device, but we have no real way from the
kernel to force and know a port is physically powered off.  I'll look
and see how imposing latency-wise it would be to always warm reset,
but we may want to just quirk temperamental devices and hosts as we
find them and use the timeout as a backstop.

>
>>  Other xHCs that fail to propagate
>> warm resets on hub resume may want to trigger this behavior via a quirk.
>
> What do you mean by "other xHCs"?  Other than what?
>

Other "xHCs" referring again to that warm reset thread and the
hypothesis that the Synopsys xHC is not propagating warm resets on
host resume.

> I don't want to go over this patch in detail, because it's pretty
> confusing and the code is messy.  Still, it seems odd to add all those
> port status manipulations in usb_port_runtime_resume, when
> hub_port_debounce_be_connected is already doing them.
>
> And why do we need another special flag to indicate that a warm reset
> is needed?  Can't check_port_resume_type figure that out from the port
> status?  That routine was meant for exactly this sort of thing.
>

check_port_resume_type() does not have the context to make the
determination.  LS_POLLING is a valid state, we only know that a warm
reset is required when it has been in this state for "too long".
Unfortunately, the timeout needs to consider that the device is coming
from physically powered off condition (rather than just logical) so it
at least needs to be 2 seconds for a connection (per commit ad493e5
usb: add usb port auto power off mechanism).
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux