Hi, On Wed, Nov 12, 2014 at 04:54:06PM -0500, Alan Stern wrote: [...] > > and the doorbell will never rung. But even if I drop EP_HALTED from the > > check below and let EP doorbell be rung, nothing will happen because, > > according to XHCI spec 1.0, we *must* first issue a Reset Endpoint > > command to get Endpoint to move to EP_STATE_STOPPED, in which case, > > ringing that EP's doorbell will actually cause a transfer to happen. > > > > Right now, what happens is that second usb_submit_urb() does nothing and > > the 10 second timer expires, causing the URB to be canceled and test > > failing with -ETIMEDOUT. > > Okay, I see. What usbcore and usbtest expect to happen is this: > > (1) An URB fails because the endpoint sent a STALL. The completion > routine is called with status -EPIPE. > > (2) When the completion routine returns, the next URB in the > endpoint's queue should get handled by the hardware. If the > endpoint is still halted, this URB should fail with -EPIPE > just like the first. > > (3) Etc. Eventually the endpoint queue empties out or the class > driver calls usb_clear_halt(). perfect :-) > So (1) works as desired, but (2) doesn't work because the doorbell > never gets rung. right. > And the easiest way to make (2) work is to issue a Reset Endpoint > command. There's one extra bit of information here, see below. > (There are other, more complicated ways to get the same result. For > instance, you could loop through the remaining queued URBs, giving them > back one by one with -EPIPE. And each time an URB is submitted, you > could give it back right away. But Reset Endpoint is simpler.) IMO issuing a Reset Endpoint is the only correct way of implementing this. See, there are two sides to usbtest + g_zero love relationship: (a) it will help you test your UDC driver. (b) it will help you test your HCD driver. Currently, we have a bug with (b), but if iterate over the list of submitted URBs we can create two problems: (i) There's no way to synchronize class driver's usb_submit_urb() with the exact time when we iterate over the list of pending URBs in xhci-hcd. Which means that we could very well iterate over an empty list and think everything was fine. Heck, this would even happen with usbtest, note that simple_io() always waits 10 seconds for a transfer completion before returning control to the caller. (ii) We would fall into the possibility of not catching bugs with UDCs running g_zero because HCD would just return early -EPIPE for us without asking UDC to handle another transfer :-) Because of these two cases, I think the *only* way to solve this is by issuing a Reset Endpoint cmd so another token can be shifted onto the data lines. > In the patch, you talked about clearing the endpoint halt. But that's > not what you want to do; you want to issue a Reset Endpoint command, > which affects only the host controller. The endpoint's status in the this is exactly what xhci_cleanup_halted_endpoint() does. Sure, it's a misnamer and should probably be renamed to either xhci_reset_endpoint_cmd() or xhci_clear_endpoint_state_halted() to make it clearer as to what's happening. But that can happen later, we don't need to clean that up in order to fix the bug :-) > peripheral device will remain unchanged -- no halt will be cleared. > That contributed to my confusion on reading the patch. yeah, that got me for a while too. I had to keep reminding myself what xhci_cleanup_halted_endpoint() was doing ;-) > By the way, does the same sort of thing happen after a transfer > error (such as a CRC mismatch)? Does the xHCI controller change the > state to EP_STATE_HALTED? Or does it instead go directly to There are a few conditions in which XHCI will change EP state to EP_STATE_HALTED, one of them is a STALL token from the peripheral and the others would be really error conditions: Babble, CRC error, etc. The spec even has a note about it, which I quote: " A Halt condition or USB Transaction error detected on a USB pipe shall cause a Running Endpoint to transition to the Halted state. A Reset Endpoint Command shall be used to clear the Halt condition on the endpoint and transition the endpoint to the Stopped state. A Stop Endpoint Command received while an endpoint is in the Halted state shall have no effect and shall generate a Command Completion Event with the Completion Code set to Context State Error. " > EP_STATE_STOPPED? You probably want to treat that case and the STALL > case as similarly as possible. There's already code to deal with that, take a look at xhci_requires_manual_halt_cleanup() and its callers: 1754 static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, 1755 struct xhci_ep_ctx *ep_ctx, 1756 unsigned int trb_comp_code) 1757 { 1758 /* TRB completion codes that may require a manual halt cleanup */ 1759 if (trb_comp_code == COMP_TX_ERR || 1760 trb_comp_code == COMP_BABBLE || 1761 trb_comp_code == COMP_SPLIT_ERR) 1762 /* The 0.96 spec says a babbling control endpoint 1763 * is not halted. The 0.96 spec says it is. Some HW 1764 * claims to be 0.95 compliant, but it halts the control 1765 * endpoint anyway. Check if a babble halted the 1766 * endpoint. 1767 */ 1768 if ((ep_ctx->ep_info & cpu_to_le32(EP_STATE_MASK)) == 1769 cpu_to_le32(EP_STATE_HALTED)) 1770 return 1; 1771 1772 return 0; 1773 } In fact, there is a typo on that comment. It needs this patch: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4e8c3cf..a8bbacb 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1759,7 +1759,7 @@ static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, if (trb_comp_code == COMP_TX_ERR || trb_comp_code == COMP_BABBLE || trb_comp_code == COMP_SPLIT_ERR) - /* The 0.96 spec says a babbling control endpoint + /* The 0.95 spec says a babbling control endpoint * is not halted. The 0.96 spec says it is. Some HW * claims to be 0.95 compliant, but it halts the control * endpoint anyway. Check if a babble halted the > > > For instance, if an endpoint is halted then there's no reason for the > > > controller to shift any USB tokens for it onto the data lines. Doing > > > so would just be a waste of bandwidth, since the response is bound to > > > be another STALL. And it doesn't matter that the peripheral has no > > > means to STALL any follow-up iens, since the host controller already > > > knows the endpoint is halted. > > > > Now you're claiming that this is a bug on usbtest which has been in tree > > for many, many years and is known to work with EHCI, MUSB and UHCI (at > > least, probably dummy too), which is a different statement from previous > > thread [1]. > > No, I simply failed to understood what you wanted to do. alright. > > > The comment in the patch talks about moving the dequeue pointer past > > > the STALLed TD and then clearing the halt condition. Moving the > > > dequeue pointer is fine -- there's no other way to take control of the > > > TD back from the hardware -- but why would you want to clear the halt? > > > The HCD isn't supposed to do that; the class driver is. > > > > See what usbtest does. It wants to make sure that, even if we issue > > several URBs for that endpoint, the function will always STALL. Sure, > > it's a waste of bandwidth, but what's the probability that any class > > driver will actually do this outside of a test environment ? I think > > it's not up to the HCD to device and it should, rather, let the function > > respond with the expected STALL again which will, once more, move the > > endpoint back into EP_STATE_HALT. > > > > The only thing we should be discussing here, is proper placement for > > xhci_cleanup_halted_endpoint(). > > Right. In theory you could do it any time up until the completion > routine returns. Doing it when you process the failed TD seems like a > good choice -- advance the dequeue pointer and issue the command at the > same time. My concern here is that this will happen when the first usb_submit_urb() completes. I wonder if moving this to when the following usb_submit_urb() is about to start would be better ? Something like below (it won't build, just trying to illustrate the situation): diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4e8c3cf..a1dac2c 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2814,6 +2814,12 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring, return -EINVAL; case EP_STATE_HALTED: xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n"); + if (!usb_endpoint_xfer_control(&td->urb->ep->desc)) { + xhci_dbg(xhci, "Resetting endpoint.\n"); + xhci_cleanup_halted_endpoint(xhci, + slot_id, ep_index, ep_ring->stream_id, + td, event_trb); + } case EP_STATE_STOPPED: case EP_STATE_RUNNING: break; I think this has a higher probability of being correct. Class driver might not queue any URB to a particular after the first Stall, so why would be move the endpoint away from EP_STATE_HALTED prematurely ? What do you think ? -- balbi
Attachment:
signature.asc
Description: Digital signature