> Ensure that an endpoint halted due to device STALL is not > restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to > the device. > > The host side of the endpoint may otherwise be started early by the > 'Set TR Deq' command completion handler which is called if dequeue > is moved past a cancelled or halted TD. > > Prevent this with a new flag set for bulk and interrupt endpoints > when a Stall Error is received. Clear it in hcd->endpoint_reset() > which is called after Clear_Feature(ENDPOINT_HALT) is sent. > > Also add a debug message if a class driver queues a new URB after > the STALL. Note that class driver might not be aware of the STALL > yet when it submits the URB as URBs are given back in BH. > > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> Sorry for coming this late, but I haven't looked closely at some of those xhci/for-next patches before. This one is unfortunately incomplete, as follows: > drivers/usb/host/xhci-ring.c | 7 +++++-- > drivers/usb/host/xhci.c | 6 ++++++ > drivers/usb/host/xhci.h | 3 ++- > 3 files changed, 13 insertions(+), 3 deletions(-) > >diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >index c2e15a27338b..7643ab9ec3b4 100644 >--- a/drivers/usb/host/xhci-ring.c >+++ b/drivers/usb/host/xhci-ring.c >@@ -556,8 +556,8 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci, > * pointer command pending because the device can choose to start any > * stream once the endpoint is on the HW schedule. > */ >- if ((ep_state & EP_STOP_CMD_PENDING) || (ep_state & SET_DEQ_PENDING) || >- (ep_state & EP_HALTED) || (ep_state & EP_CLEARING_TT)) >+ if (ep_state & (EP_STOP_CMD_PENDING | SET_DEQ_PENDING | EP_HALTED | >+ EP_CLEARING_TT | EP_STALLED)) > return; Any flag added to this list needs to be added to xhci_urb_dequeue() too so it knowns that the endpoint is held in Stopped state and URBs can be unlinked without trying to stop it again. There really should be a helper function used both here and there, but those Stop EP patches were meant for stable and I strived to make them small and noninvasive. Then I forgot about this cleanup. NB: I also forgot about a bunch of low-impact halted EP handling bugs, I will try to rebase and send them out today or over the weekend. > trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id)); > @@ -2555,6 +2555,9 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, > > xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET); > return; > + case COMP_STALL_ERROR: > + ep->ep_state |= EP_STALLED; > + break; > default: > /* do nothing */ > break; > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c > index 3f2cd546a7a2..0c22b78358b9 100644 > --- a/drivers/usb/host/xhci.c > +++ b/drivers/usb/host/xhci.c > @@ -1604,6 +1604,11 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag > goto free_priv; > } > > + /* Class driver might not be aware ep halted due to async URB giveback */ > + if (*ep_state & EP_STALLED) > + dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n", > + urb); > + > switch (usb_endpoint_type(&urb->ep->desc)) { > > case USB_ENDPOINT_XFER_CONTROL: > @@ -3202,6 +3207,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd, > return; > > ep = &vdev->eps[ep_index]; > + ep->ep_state &= ~EP_STALLED; ... and clearing any of those flags has always been followed by calling xhci_ring_ep_doorbell() again, to ensure that the endpoint is restarted if it has URBs on it but restart was held off due to the flag. xhci_urb_dequeue() relies on this too, because it looked lke sensible design: if you have reasons not to run the EP, you set a flag. Reasons are gone, you clear the flag and it's running again. > /* Bail out if toggle is already being cleared by a endpoint reset */ > spin_lock_irqsave(&xhci->lock, flags); >diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h >index cd96e0a8c593..4ee14f651d36 100644 >--- a/drivers/usb/host/xhci.h >+++ b/drivers/usb/host/xhci.h >@@ -664,7 +664,7 @@ struct xhci_virt_ep { > unsigned int err_count; > unsigned int ep_state; > #define SET_DEQ_PENDING (1 << 0) >-#define EP_HALTED (1 << 1) /* For stall handling */ >+#define EP_HALTED (1 << 1) /* Halted host ep handling */ > #define EP_STOP_CMD_PENDING (1 << 2) /* For URB cancellation */ > /* Transitioning the endpoint to using streams, don't enqueue URBs */ > #define EP_GETTING_STREAMS (1 << 3) >@@ -675,6 +675,7 @@ struct xhci_virt_ep { > #define EP_SOFT_CLEAR_TOGGLE (1 << 7) > /* usb_hub_clear_tt_buffer is in progress */ > #define EP_CLEARING_TT (1 << 8) >+#define EP_STALLED (1 << 9) /* For stall handling */ I guess usage rules of those flags should be documented somewhere here and helpers added such as: xhci_ep_cancel_pending() xhci_ep_held_stopped() to improve maintainability and prevent similar problems in the future. I could sit and write something, I still have this stuff quite fresh in memory after spending a few weeks debugging those crazy HW races. Regards, Michal