On 13.11.2014 00:28, Felipe Balbi wrote: > Hi, > > On Wed, Nov 12, 2014 at 04:54:06PM -0500, Alan Stern wrote: > > [...] > >>> and the doorbell will never rung. But even if I drop EP_HALTED from the >>> check below and let EP doorbell be rung, nothing will happen because, >>> according to XHCI spec 1.0, we *must* first issue a Reset Endpoint >>> command to get Endpoint to move to EP_STATE_STOPPED, in which case, >>> ringing that EP's doorbell will actually cause a transfer to happen. >>> >>> Right now, what happens is that second usb_submit_urb() does nothing and >>> the 10 second timer expires, causing the URB to be canceled and test >>> failing with -ETIMEDOUT. >> >> Okay, I see. What usbcore and usbtest expect to happen is this: >> >> (1) An URB fails because the endpoint sent a STALL. The completion >> routine is called with status -EPIPE. >> >> (2) When the completion routine returns, the next URB in the >> endpoint's queue should get handled by the hardware. If the >> endpoint is still halted, this URB should fail with -EPIPE >> just like the first. >> >> (3) Etc. Eventually the endpoint queue empties out or the class >> driver calls usb_clear_halt(). > > perfect :-) > >> So (1) works as desired, but (2) doesn't work because the doorbell >> never gets rung. > > right. > >> And the easiest way to make (2) work is to issue a Reset Endpoint >> command. > > There's one extra bit of information here, see below. > >> (There are other, more complicated ways to get the same result. For >> instance, you could loop through the remaining queued URBs, giving them >> back one by one with -EPIPE. And each time an URB is submitted, you >> could give it back right away. But Reset Endpoint is simpler.) > > IMO issuing a Reset Endpoint is the only correct way of implementing > this. See, there are two sides to usbtest + g_zero love relationship: > > (a) it will help you test your UDC driver. > (b) it will help you test your HCD driver. > > Currently, we have a bug with (b), but if iterate over the list of > submitted URBs we can create two problems: > > (i) There's no way to synchronize class driver's usb_submit_urb() with > the exact time when we iterate over the list of pending URBs in > xhci-hcd. Which means that we could very well iterate over an empty list > and think everything was fine. Heck, this would even happen with > usbtest, note that simple_io() always waits 10 seconds for a transfer > completion before returning control to the caller. > > (ii) We would fall into the possibility of not catching bugs with UDCs > running g_zero because HCD would just return early -EPIPE for us without > asking UDC to handle another transfer :-) > > Because of these two cases, I think the *only* way to solve this is by > issuing a Reset Endpoint cmd so another token can be shifted onto the > data lines. > >> In the patch, you talked about clearing the endpoint halt. But that's >> not what you want to do; you want to issue a Reset Endpoint command, >> which affects only the host controller. The endpoint's status in the > > this is exactly what xhci_cleanup_halted_endpoint() does. Sure, it's a > misnamer and should probably be renamed to either > xhci_reset_endpoint_cmd() or xhci_clear_endpoint_state_halted() to make > it clearer as to what's happening. But that can happen later, we don't > need to clean that up in order to fix the bug :-) > >> peripheral device will remain unchanged -- no halt will be cleared. >> That contributed to my confusion on reading the patch. > > yeah, that got me for a while too. I had to keep reminding myself what > xhci_cleanup_halted_endpoint() was doing ;-) > >> By the way, does the same sort of thing happen after a transfer >> error (such as a CRC mismatch)? Does the xHCI controller change the >> state to EP_STATE_HALTED? Or does it instead go directly to > > There are a few conditions in which XHCI will change EP state to > EP_STATE_HALTED, one of them is a STALL token from the peripheral and > the others would be really error conditions: Babble, CRC error, etc. > > The spec even has a note about it, which I quote: > > " > A Halt condition or USB Transaction error detected on a USB pipe > shall cause a Running Endpoint to transition to the Halted > state. A Reset Endpoint Command shall be used to clear the Halt > condition on the endpoint and transition the endpoint to the > Stopped state. A Stop Endpoint Command received while an > endpoint is in the Halted state shall have no effect and shall > generate a Command Completion Event with the Completion Code set > to Context State Error. > " > >> EP_STATE_STOPPED? You probably want to treat that case and the STALL >> case as similarly as possible. > > There's already code to deal with that, take a look at > xhci_requires_manual_halt_cleanup() and its callers: > > 1754 static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, > 1755 struct xhci_ep_ctx *ep_ctx, > 1756 unsigned int trb_comp_code) > 1757 { > 1758 /* TRB completion codes that may require a manual halt cleanup */ > 1759 if (trb_comp_code == COMP_TX_ERR || > 1760 trb_comp_code == COMP_BABBLE || > 1761 trb_comp_code == COMP_SPLIT_ERR) > 1762 /* The 0.96 spec says a babbling control endpoint > 1763 * is not halted. The 0.96 spec says it is. Some HW > 1764 * claims to be 0.95 compliant, but it halts the control > 1765 * endpoint anyway. Check if a babble halted the > 1766 * endpoint. > 1767 */ > 1768 if ((ep_ctx->ep_info & cpu_to_le32(EP_STATE_MASK)) == > 1769 cpu_to_le32(EP_STATE_HALTED)) > 1770 return 1; > 1771 > 1772 return 0; > 1773 } > > In fact, there is a typo on that comment. It needs this patch: > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 4e8c3cf..a8bbacb 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -1759,7 +1759,7 @@ static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, > if (trb_comp_code == COMP_TX_ERR || > trb_comp_code == COMP_BABBLE || > trb_comp_code == COMP_SPLIT_ERR) > - /* The 0.96 spec says a babbling control endpoint > + /* The 0.95 spec says a babbling control endpoint > * is not halted. The 0.96 spec says it is. Some HW > * claims to be 0.95 compliant, but it halts the control > * endpoint anyway. Check if a babble halted the > >>>> For instance, if an endpoint is halted then there's no reason for the >>>> controller to shift any USB tokens for it onto the data lines. Doing >>>> so would just be a waste of bandwidth, since the response is bound to >>>> be another STALL. And it doesn't matter that the peripheral has no >>>> means to STALL any follow-up iens, since the host controller already >>>> knows the endpoint is halted. >>> >>> Now you're claiming that this is a bug on usbtest which has been in tree >>> for many, many years and is known to work with EHCI, MUSB and UHCI (at >>> least, probably dummy too), which is a different statement from previous >>> thread [1]. >> >> No, I simply failed to understood what you wanted to do. > > alright. > >>>> The comment in the patch talks about moving the dequeue pointer past >>>> the STALLed TD and then clearing the halt condition. Moving the >>>> dequeue pointer is fine -- there's no other way to take control of the >>>> TD back from the hardware -- but why would you want to clear the halt? >>>> The HCD isn't supposed to do that; the class driver is. >>> >>> See what usbtest does. It wants to make sure that, even if we issue >>> several URBs for that endpoint, the function will always STALL. Sure, >>> it's a waste of bandwidth, but what's the probability that any class >>> driver will actually do this outside of a test environment ? I think >>> it's not up to the HCD to device and it should, rather, let the function >>> respond with the expected STALL again which will, once more, move the >>> endpoint back into EP_STATE_HALT. >>> >>> The only thing we should be discussing here, is proper placement for >>> xhci_cleanup_halted_endpoint(). >> >> Right. In theory you could do it any time up until the completion >> routine returns. Doing it when you process the failed TD seems like a >> good choice -- advance the dequeue pointer and issue the command at the >> same time. > > My concern here is that this will happen when the first usb_submit_urb() > completes. I wonder if moving this to when the following > usb_submit_urb() is about to start would be better ? > > Something like below (it won't build, just trying to illustrate the > situation): > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 4e8c3cf..a1dac2c 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -2814,6 +2814,12 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring, > return -EINVAL; > case EP_STATE_HALTED: > xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n"); > + if (!usb_endpoint_xfer_control(&td->urb->ep->desc)) { > + xhci_dbg(xhci, "Resetting endpoint.\n"); > + xhci_cleanup_halted_endpoint(xhci, > + slot_id, ep_index, ep_ring->stream_id, > + td, event_trb); > + } > case EP_STATE_STOPPED: > case EP_STATE_RUNNING: > break; > > > I think this has a higher probability of being correct. Class driver > might not queue any URB to a particular after the first Stall, so why > would be move the endpoint away from EP_STATE_HALTED prematurely ? > > What do you think ? > Currently we queue a reset endpoint command from the .endpoint_reset callback in host, this is far too late and should be moved to when we get a STALL event. xhci needs to reset control endpoints on stall as well [1] I got a testpatch for this, but the more I look into how we handle reset endpoint for clearing halts, stop endpoint for urb dequeue, and reset device, the more I notice that there are several other cases that needs fixing. testpatch for the halted ep is here: https://git.kernel.org/cgit/linux/kernel/git/mnyman/xhci.git/commit/?h=ep_reset_halt_test&id=fe43d559e0816f65e5373e863a7da8062d311cd7 It's hard to see from patch diff itself what it does, but basically we call xhci_cleanup_halted_endpoint() in finish_td() if the transfer event status is STALL, or a TX error that requires resetting the endpoint. There are still issues with setting the dequeue pointer correctly after stop or reset endpoint, I think this is because we try to find the next TD based on a saved "stopped TD" value that might not valid anymore (i.e. a reset device in between reset endpoint and set dq pointer) this issue is seen with DVB tuners when changing channels: https://bugzilla.kernel.org/show_bug.cgi?id=75521 -Mathias (1) xhci 1.0 section 4.8.3: Note: A STALL detected on any stage (Setup, Data, or Status) of a Default Control Endpoint request shall transition the Endpoint Context to the Halted state. A Default Control Endpoint STALL condition is cleared by a Reset Endpoint Command which transitions the endpoint from the Halted to the Stopped state. The Default Control Endpoint shall return to the Running state when the Doorbell is rung for the next Setup Stage TD sent to the endpoint. Section 8.5.3.4 of the USB2 spec and section 8.12.2.3 of the USB3 spec state of Control pipes, “Unlike the case of a functional stall, protocol stall does not indicate an error with the device.” The xHC treats a functional stall and protocol stall identically, by Halting the endpoint and requiring software to clear the condition by issuing a Reset Endpoint Command -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html