Re: [PATCH] usb: xhci: add quirk flag for broken stop command on AMD platforms

Mathias Nyman <mathias.nyman@xxxxxxxxx> · Fri, 26 May 2017 15:21:45 +0300

On 25.05.2017 16:17, Shyam Sundar S K wrote:


On 5/22/2017 8:26 PM, Shyam Sundar S K wrote:

On 5/22/2017 6:49 PM, Mathias Nyman wrote:
On 22.05.2017 11:56, Shyam Sundar S K wrote:
Hi Mathias,


On 5/19/2017 12:43 PM, Mathias Nyman wrote:
On 18.05.2017 16:46, Alan Stern wrote:
On Thu, 18 May 2017, Shyam Sundar S K wrote:

on AMD platforms with SNPS 3.1 USB controller has an issue
if the stop EP command is issued when the controller is not
in running state. If issued, it is leading to a critical RTL bug
because of which controller goes into irrecoverable state.

This patch adds a appropriate checks to make sure that scenario
does not happen.

Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx>
Signed-off-by: Nehal Shah <Nehal-bakulchandra.Shah@xxxxxxx>
---

--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1819,6 +1819,7 @@ struct xhci_hcd {
    /* For controller with a broken Port Disable implementation */
    #define XHCI_BROKEN_PORT_PED    (1 << 25)
    #define XHCI_LIMIT_ENDPOINT_INTERVAL_7    (1 << 26)
+#define XHCI_BROKEN_STOP    (1 << 27)
Does there really need to be a quirk flag for this?  I should think
that you never want to issue a STOP EP command while the controller is
not running, no matter what kind of controller it is.

Alan Stern

If it's about controller not running then there shouldn't be any problems.
We shouldn't issue a stop endpoint command if controller is halted.

If this is about issuing a stop endpoint command while endpoint isn't
running, then fully working controllers should just respond with a command
completion with "context state error" status.
As per SNPS the controller is responding with "Context State Error", however the same is not getting
reflected when we check the cmd->status in the xhci driver.

Anyway, as Alan said the quirk is probably unnecessary here.
OK. We will take care of this.

We shouldn't need to
stop endpoints that are not running. Only problem I see here is that the
endpoint state in the output endpoint context might not be up to date. If driver
just restarted the endpoint by ringing the doorbell, the output context state
might not be updated yet.
Before issuing the stop end point command, we checked the state of the endpoint and it looks the state of
the end point is EP_STATE_STOPPED. If the output endpoint context is not updated is there a better way
to retrieve the EP state before issuing the stop end point command ?
Not really, checking endpoint context and possible a software variable kept up to date
by driver to keep track of doorbell. Perhaps checking endpoint ctx is enough for now
So, is it OK to guard the stop endpoint by checking the EP context before queuing it?

How does this SNPS 3.1 controller react if the endpoint just halted or moved to
error state just before controller runs the stop endpoint command? Still triggers
the RTL bug?
As per SNPS analysis.

1) Driver issues STOP ENDPOINT command  and the EP is in Running state.
2) HW executes the STOP ENDPOINT command successfully
3) Driver again issues STOP ENDPOINT command.
4) Since the EP is already halted/stopped, HW completes the command execution and reports “device context error” completion code. This is as per the spec.
5) However HW on receiving the second command additionally marks EP to Flow control state in HW which is RTL bug
6) The above bug causes the HW not to respond to any further doorbells that are rung by the driver. This causes the EP to not functional anymore and causes gross functional failures.

What happens if endpoint ctx shows endpoint is in the halted or error state when stop endpoint command is issued?
  still RTL bug?
Yes. That's right. If EP context shows as halted/stopped/error and we issue a stop endpoint command it is triggering the RTL bug. Since the tapeout has already happened and there is no way to fix this
from SNPS side, they are asking for a SW workaround i.e.  "issuing the stop endpoint command only when the EP context state is running."

So, it is OK to have this patch submission which will check for EP context before queuing the stop endpoint command ?

I'm talking about the in xhci spec 4.6.9:

" A Busy endpoint may asynchronously transition from the Running to the Halted or Error state due
to error conditions detected while processing TRBs. A possible race condition may occur if
software, thinking an endpoint is in the Running state, issues a Stop Endpoint Command however
at the same time the xHC asynchronously transitions the endpoint to the Halted or Error state. In
this case, a Context State Error may be generated for the command completion. Software may
verify that this case occurred by inspecting the EP State for Halted or Error when a Stop Endpoint
Command results in a Context State Error."

Since we are novice, can you please help to understand what is the intuition behind sending two stop endpoint commands ?
No need for two stop endpoint commands, that can be fixed in the driver.
How ? can you please help to understand.

-Mathias
Hi Mathias,

Any feedback on this patch ?


1. Drop the quirk,

2.Check endpoint state both from endpoint context (ep_ctx) which is updated by xHc hardware,
and the from a software variable as xhci specs suggest, see last note in section 4.8.3
starting with

"Note:
There are several cases where the EP State field in the Output Endpoint Context may not reflect
the current state of an endpoint.."

It contains good information on what needs to be done.

For this the ep_state in struct xhci_virt_ep can be used. It already keeps track of halt and pending stop
states. just need to add relibale running state (i.e. when doorbell is rang or at set tr deq pointer command)

-Mathias
 

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html