[PATCH 6/6] usb: xhci: Update comments about Stop Endpoint races

Michal Pecio <michal.pecio@xxxxxxxxx> · Mon, 10 Mar 2025 09:42:13 +0100

The switch statement has grown beyond its original EP_STATE_HALTED case,
so move related comments inside this case and shorten them somewhat.

Shorten EP_STATE_STOPPED comments, add some common context at the top.

Signed-off-by: Michal Pecio <michal.pecio@xxxxxxxxx>
---
 drivers/usb/host/xhci-ring.c | 70 ++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 241cd82672a6..759e8f612b4d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1199,20 +1199,21 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 	trace_xhci_handle_cmd_stop_ep(ep_ctx);
 
 	if (comp_code == COMP_CONTEXT_STATE_ERROR) {
-	/*
-	 * If stop endpoint command raced with a halting endpoint we need to
-	 * reset the host side endpoint first.
-	 * If the TD we halted on isn't cancelled the TD should be given back
-	 * with a proper error code, and the ring dequeue moved past the TD.
-	 * If streams case we can't find hw_deq, or the TD we halted on so do a
-	 * soft reset.
-	 *
-	 * Proper error code is unknown here, it would be -EPIPE if device side
-	 * of enadpoit halted (aka STALL), and -EPROTO if not (transaction error)
-	 * We use -EPROTO, if device is stalled it should return a stall error on
-	 * next transfer, which then will return -EPIPE, and device side stall is
-	 * noted and cleared by class driver.
-	 */
+
+		/*
+		 * This happens if the endpoint was not in the Running state. Its state now may
+		 * be other than at the time the command failed. We need to look at the Endpoint
+		 * Context and driver state to figure out what went wrong and what to do next.
+		 *
+		 * Many HCs are kind enough to hide their internal state transitions from us and
+		 * never fail Stop Endpoint after a doorbell ring. Others, including vendors like
+		 * NEC, ASMedia or Intel, may fail the command and begin running microseconds or
+		 * even milliseconds later (up to an ESIT on NEC periodic endpoints).
+		 *
+		 * We may see Running or Halted state after the command really failed on Stopped.
+		 * We may see Stopped after it failed on Halted, if somebody resets the endpoint.
+		 */
+
 		switch (GET_EP_CTX_STATE(ep_ctx)) {
 		case EP_STATE_HALTED:
 			xhci_dbg(xhci, "Stop ep completion raced with stall\n");
@@ -1222,8 +1223,23 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 			 */
 			if (ep->ep_state & EP_HALTED)
 				goto reset_done;
-
+			/*
+			 * Maybe reset failed with -ENOMEM. We are paranoid that a buggy HC may
+			 * mishandle the race and produce no transfer event before this command
+			 * completion. Or the endpoint actually was restarting, rejected Stop
+			 * Endpoint, then started and halted. The transfer event may only come
+			 * after this completion and some ASMedia HCs don't report such events.
+			 *
+			 * Try to reset the host endpoint now. Locate the halted TD, update its
+			 * status and cancel it. Reset EP completion takes care of the rest.
+			 *
+			 * Proper status is unknown here, it would be -EPIPE if the device side
+			 * endpoint stalled and -EPROTO otherwise (Transaction Error, etc).
+			 * We use -EPROTO, if device is stalled it should keep returning STALL
+			 * on future transfers which will hopefully receive normal handling.
+			 */
 			if (ep->ep_state & EP_HAS_STREAMS) {
+				/* We don't know which stream failed, attempt Soft Retry */
 				reset_type = EP_SOFT_RESET;
 			} else {
 				reset_type = EP_HARD_RESET;
@@ -1231,39 +1247,31 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 				if (td)
 					td->status = -EPROTO;
 			}
-			/* reset ep, reset handler cleans up cancelled tds */
 			err = xhci_handle_halted_endpoint(xhci, ep, td, reset_type);
 			xhci_dbg(xhci, "Stop ep completion resetting ep, status %d\n", err);
 			if (err)
+				/* persistent problem or disconnection, we must give back TDs */
 				break;
 reset_done:
 			/* Reset EP handler will clean up cancelled TDs */
 			ep->ep_state &= ~EP_STOP_CMD_PENDING;
 			return;
+
 		case EP_STATE_STOPPED:
 			/*
-			 * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
-			 * EP is a Context State Error, and EP stays Stopped.
-			 *
-			 * But maybe it failed on Halted, and somebody ran Reset
-			 * Endpoint later. EP state is now Stopped and EP_HALTED
-			 * still set because Reset EP handler will run after us.
+			 * Maybe the command failed in Halted state and the transfer event queued
+			 * Reset Endpoint. The endpoint is now Stopped and EP_HALTED is still set,
+			 * because Reset Endpoint completion handler will run after this one.
 			 */
 			if (ep->ep_state & EP_HALTED)
 				break;
 			/*
-			 * On some HCs EP state remains Stopped for some tens of
-			 * us to a few ms or more after a doorbell ring, and any
-			 * new Stop Endpoint fails without aborting the restart.
-			 * This handler may run quickly enough to still see this
-			 * Stopped state, but it will soon change to Running.
-			 *
-			 * Assume this bug on unexpected Stop Endpoint failures.
-			 * Keep retrying until the EP starts and stops again.
+			 * We don't queue Stop Endpoint on Stopped endpoints. Assume the pending
+			 * restart case and keep retrying until it starts and stops again.
 			 */
 			fallthrough;
+
 		case EP_STATE_RUNNING:
-			/* Race, HW handled stop ep cmd before ep was running */
 			xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n",
 					GET_EP_CTX_STATE(ep_ctx));
 			/*
-- 
2.48.1