Re: [PATCH] usb: xhci: add quirk flag for broken stop command on AMD platforms

Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx> · Mon, 22 May 2017 20:26:15 +0530

On 5/22/2017 6:49 PM, Mathias Nyman wrote:
> On 22.05.2017 11:56, Shyam Sundar S K wrote:
>> Hi Mathias,
>>
>>
>> On 5/19/2017 12:43 PM, Mathias Nyman wrote:
>>> On 18.05.2017 16:46, Alan Stern wrote:
>>>> On Thu, 18 May 2017, Shyam Sundar S K wrote:
>>>>
>>>>> on AMD platforms with SNPS 3.1 USB controller has an issue
>>>>> if the stop EP command is issued when the controller is not
>>>>> in running state. If issued, it is leading to a critical RTL bug
>>>>> because of which controller goes into irrecoverable state.
>>>>>
>>>>> This patch adds a appropriate checks to make sure that scenario
>>>>> does not happen.
>>>>>
>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx>
>>>>> Signed-off-by: Nehal Shah <Nehal-bakulchandra.Shah@xxxxxxx>
>>>>> ---
>>>>
>>>>> --- a/drivers/usb/host/xhci.h
>>>>> +++ b/drivers/usb/host/xhci.h
>>>>> @@ -1819,6 +1819,7 @@ struct xhci_hcd {
>>>>>    /* For controller with a broken Port Disable implementation */
>>>>>    #define XHCI_BROKEN_PORT_PED    (1 << 25)
>>>>>    #define XHCI_LIMIT_ENDPOINT_INTERVAL_7    (1 << 26)
>>>>> +#define XHCI_BROKEN_STOP    (1 << 27)
>>>>
>>>> Does there really need to be a quirk flag for this?  I should think
>>>> that you never want to issue a STOP EP command while the controller is
>>>> not running, no matter what kind of controller it is.
>>>>
>>>> Alan Stern
>>>>
>>>
>>> If it's about controller not running then there shouldn't be any problems.
>>> We shouldn't issue a stop endpoint command if controller is halted.
>>>
>>> If this is about issuing a stop endpoint command while endpoint isn't
>>> running, then fully working controllers should just respond with a command
>>> completion with "context state error" status.
>>
>> As per SNPS the controller is responding with "Context State Error", however the same is not getting
>> reflected when we check the cmd->status in the xhci driver.
>>
>>> Anyway, as Alan said the quirk is probably unnecessary here.
>>
>> OK. We will take care of this.
>>
>>> We shouldn't need to
>>> stop endpoints that are not running. Only problem I see here is that the
>>> endpoint state in the output endpoint context might not be up to date. If driver
>>> just restarted the endpoint by ringing the doorbell, the output context state
>>> might not be updated yet.
>>
>> Before issuing the stop end point command, we checked the state of the endpoint and it looks the state of
>> the end point is EP_STATE_STOPPED. If the output endpoint context is not updated is there a better way
>> to retrieve the EP state before issuing the stop end point command ?
>
> Not really, checking endpoint context and possible a software variable kept up to date
> by driver to keep track of doorbell. Perhaps checking endpoint ctx is enough for now
So, is it OK to guard the stop endpoint by checking the EP context before queuing it?

>
>>
>>>
>>> How does this SNPS 3.1 controller react if the endpoint just halted or moved to
>>> error state just before controller runs the stop endpoint command? Still triggers
>>> the RTL bug?
>>
>> As per SNPS analysis.
>>
>> 1) Driver issues STOP ENDPOINT command  and the EP is in Running state.
>> 2) HW executes the STOP ENDPOINT command successfully
>> 3) Driver again issues STOP ENDPOINT command.
>> 4) Since the EP is already halted/stopped, HW completes the command execution and reports “device context error” completion code. This is as per the spec.
>> 5) However HW on receiving the second command additionally marks EP to Flow control state in HW which is RTL bug
>> 6) The above bug causes the HW not to respond to any further doorbells that are rung by the driver. This causes the EP to not functional anymore and causes gross functional failures.
>>
>
> What happens if endpoint ctx shows endpoint is in the halted or error state when stop endpoint command is issued?
>  still RTL bug?

Yes. That's right. If EP context shows as halted/stopped/error and we issue a stop endpoint command it is triggering the RTL bug. Since the tapeout has already happened and there is no way to fix this
from SNPS side, they are asking for a SW workaround i.e.  "issuing the stop endpoint command only when the EP context state is running."

So, it is OK to have this patch submission which will check for EP context before queuing the stop endpoint command ?

>
>>
>>>
>>> I'm talking about the in xhci spec 4.6.9:
>>>
>>> " A Busy endpoint may asynchronously transition from the Running to the Halted or Error state due
>>> to error conditions detected while processing TRBs. A possible race condition may occur if
>>> software, thinking an endpoint is in the Running state, issues a Stop Endpoint Command however
>>> at the same time the xHC asynchronously transitions the endpoint to the Halted or Error state. In
>>> this case, a Context State Error may be generated for the command completion. Software may
>>> verify that this case occurred by inspecting the EP State for Halted or Error when a Stop Endpoint
>>> Command results in a Context State Error."
>>>
>>
>> Since we are novice, can you please help to understand what is the intuition behind sending two stop endpoint commands ?
>
> No need for two stop endpoint commands, that can be fixed in the driver.

How ? can you please help to understand.

> -Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html