Hi On 13.11.2014 17:11, Felipe Balbi wrote: > Hi, > > On Thu, Nov 13, 2014 at 01:16:34PM +0200, Mathias Nyman wrote: >> On 13.11.2014 00:28, Felipe Balbi wrote: >>> Hi, >>> >>> On Wed, Nov 12, 2014 at 04:54:06PM -0500, Alan Stern wrote: >>> >>> [...] >>> >>>>> and the doorbell will never rung. But even if I drop EP_HALTED from the >>>>> check below and let EP doorbell be rung, nothing will happen because, >>>>> according to XHCI spec 1.0, we *must* first issue a Reset Endpoint >>>>> command to get Endpoint to move to EP_STATE_STOPPED, in which case, >>>>> ringing that EP's doorbell will actually cause a transfer to happen. >>>>> >>>>> Right now, what happens is that second usb_submit_urb() does nothing and >>>>> the 10 second timer expires, causing the URB to be canceled and test >>>>> failing with -ETIMEDOUT. >>>> >>>> Okay, I see. What usbcore and usbtest expect to happen is this: >>>> >>>> (1) An URB fails because the endpoint sent a STALL. The completion >>>> routine is called with status -EPIPE. >>>> >>>> (2) When the completion routine returns, the next URB in the >>>> endpoint's queue should get handled by the hardware. If the >>>> endpoint is still halted, this URB should fail with -EPIPE >>>> just like the first. >>>> >>>> (3) Etc. Eventually the endpoint queue empties out or the class >>>> driver calls usb_clear_halt(). >>> >>> perfect :-) >>> >>>> So (1) works as desired, but (2) doesn't work because the doorbell >>>> never gets rung. >>> >>> right. >>> >>>> And the easiest way to make (2) work is to issue a Reset Endpoint >>>> command. >>> >>> There's one extra bit of information here, see below. >>> >>>> (There are other, more complicated ways to get the same result. For >>>> instance, you could loop through the remaining queued URBs, giving them >>>> back one by one with -EPIPE. And each time an URB is submitted, you >>>> could give it back right away. But Reset Endpoint is simpler.) >>> >>> IMO issuing a Reset Endpoint is the only correct way of implementing >>> this. See, there are two sides to usbtest + g_zero love relationship: >>> >>> (a) it will help you test your UDC driver. >>> (b) it will help you test your HCD driver. >>> >>> Currently, we have a bug with (b), but if iterate over the list of >>> submitted URBs we can create two problems: >>> >>> (i) There's no way to synchronize class driver's usb_submit_urb() with >>> the exact time when we iterate over the list of pending URBs in >>> xhci-hcd. Which means that we could very well iterate over an empty list >>> and think everything was fine. Heck, this would even happen with >>> usbtest, note that simple_io() always waits 10 seconds for a transfer >>> completion before returning control to the caller. >>> >>> (ii) We would fall into the possibility of not catching bugs with UDCs >>> running g_zero because HCD would just return early -EPIPE for us without >>> asking UDC to handle another transfer :-) >>> >>> Because of these two cases, I think the *only* way to solve this is by >>> issuing a Reset Endpoint cmd so another token can be shifted onto the >>> data lines. >>> >>>> In the patch, you talked about clearing the endpoint halt. But that's >>>> not what you want to do; you want to issue a Reset Endpoint command, >>>> which affects only the host controller. The endpoint's status in the >>> >>> this is exactly what xhci_cleanup_halted_endpoint() does. Sure, it's a >>> misnamer and should probably be renamed to either >>> xhci_reset_endpoint_cmd() or xhci_clear_endpoint_state_halted() to make >>> it clearer as to what's happening. But that can happen later, we don't >>> need to clean that up in order to fix the bug :-) >>> >>>> peripheral device will remain unchanged -- no halt will be cleared. >>>> That contributed to my confusion on reading the patch. >>> >>> yeah, that got me for a while too. I had to keep reminding myself what >>> xhci_cleanup_halted_endpoint() was doing ;-) >>> >>>> By the way, does the same sort of thing happen after a transfer >>>> error (such as a CRC mismatch)? Does the xHCI controller change the >>>> state to EP_STATE_HALTED? Or does it instead go directly to >>> >>> There are a few conditions in which XHCI will change EP state to >>> EP_STATE_HALTED, one of them is a STALL token from the peripheral and >>> the others would be really error conditions: Babble, CRC error, etc. >>> >>> The spec even has a note about it, which I quote: >>> >>> " >>> A Halt condition or USB Transaction error detected on a USB pipe >>> shall cause a Running Endpoint to transition to the Halted >>> state. A Reset Endpoint Command shall be used to clear the Halt >>> condition on the endpoint and transition the endpoint to the >>> Stopped state. A Stop Endpoint Command received while an >>> endpoint is in the Halted state shall have no effect and shall >>> generate a Command Completion Event with the Completion Code set >>> to Context State Error. >>> " >>> >>>> EP_STATE_STOPPED? You probably want to treat that case and the STALL >>>> case as similarly as possible. >>> >>> There's already code to deal with that, take a look at >>> xhci_requires_manual_halt_cleanup() and its callers: >>> >>> 1754 static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, >>> 1755 struct xhci_ep_ctx *ep_ctx, >>> 1756 unsigned int trb_comp_code) >>> 1757 { >>> 1758 /* TRB completion codes that may require a manual halt cleanup */ >>> 1759 if (trb_comp_code == COMP_TX_ERR || >>> 1760 trb_comp_code == COMP_BABBLE || >>> 1761 trb_comp_code == COMP_SPLIT_ERR) >>> 1762 /* The 0.96 spec says a babbling control endpoint >>> 1763 * is not halted. The 0.96 spec says it is. Some HW >>> 1764 * claims to be 0.95 compliant, but it halts the control >>> 1765 * endpoint anyway. Check if a babble halted the >>> 1766 * endpoint. >>> 1767 */ >>> 1768 if ((ep_ctx->ep_info & cpu_to_le32(EP_STATE_MASK)) == >>> 1769 cpu_to_le32(EP_STATE_HALTED)) >>> 1770 return 1; >>> 1771 >>> 1772 return 0; >>> 1773 } >>> >>> In fact, there is a typo on that comment. It needs this patch: >>> >>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >>> index 4e8c3cf..a8bbacb 100644 >>> --- a/drivers/usb/host/xhci-ring.c >>> +++ b/drivers/usb/host/xhci-ring.c >>> @@ -1759,7 +1759,7 @@ static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, >>> if (trb_comp_code == COMP_TX_ERR || >>> trb_comp_code == COMP_BABBLE || >>> trb_comp_code == COMP_SPLIT_ERR) >>> - /* The 0.96 spec says a babbling control endpoint >>> + /* The 0.95 spec says a babbling control endpoint >>> * is not halted. The 0.96 spec says it is. Some HW >>> * claims to be 0.95 compliant, but it halts the control >>> * endpoint anyway. Check if a babble halted the >>> >>>>>> For instance, if an endpoint is halted then there's no reason for the >>>>>> controller to shift any USB tokens for it onto the data lines. Doing >>>>>> so would just be a waste of bandwidth, since the response is bound to >>>>>> be another STALL. And it doesn't matter that the peripheral has no >>>>>> means to STALL any follow-up iens, since the host controller already >>>>>> knows the endpoint is halted. >>>>> >>>>> Now you're claiming that this is a bug on usbtest which has been in tree >>>>> for many, many years and is known to work with EHCI, MUSB and UHCI (at >>>>> least, probably dummy too), which is a different statement from previous >>>>> thread [1]. >>>> >>>> No, I simply failed to understood what you wanted to do. >>> >>> alright. >>> >>>>>> The comment in the patch talks about moving the dequeue pointer past >>>>>> the STALLed TD and then clearing the halt condition. Moving the >>>>>> dequeue pointer is fine -- there's no other way to take control of the >>>>>> TD back from the hardware -- but why would you want to clear the halt? >>>>>> The HCD isn't supposed to do that; the class driver is. >>>>> >>>>> See what usbtest does. It wants to make sure that, even if we issue >>>>> several URBs for that endpoint, the function will always STALL. Sure, >>>>> it's a waste of bandwidth, but what's the probability that any class >>>>> driver will actually do this outside of a test environment ? I think >>>>> it's not up to the HCD to device and it should, rather, let the function >>>>> respond with the expected STALL again which will, once more, move the >>>>> endpoint back into EP_STATE_HALT. >>>>> >>>>> The only thing we should be discussing here, is proper placement for >>>>> xhci_cleanup_halted_endpoint(). >>>> >>>> Right. In theory you could do it any time up until the completion >>>> routine returns. Doing it when you process the failed TD seems like a >>>> good choice -- advance the dequeue pointer and issue the command at the >>>> same time. >>> >>> My concern here is that this will happen when the first usb_submit_urb() >>> completes. I wonder if moving this to when the following >>> usb_submit_urb() is about to start would be better ? >>> >>> Something like below (it won't build, just trying to illustrate the >>> situation): >>> >>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >>> index 4e8c3cf..a1dac2c 100644 >>> --- a/drivers/usb/host/xhci-ring.c >>> +++ b/drivers/usb/host/xhci-ring.c >>> @@ -2814,6 +2814,12 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring, >>> return -EINVAL; >>> case EP_STATE_HALTED: >>> xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n"); >>> + if (!usb_endpoint_xfer_control(&td->urb->ep->desc)) { >>> + xhci_dbg(xhci, "Resetting endpoint.\n"); >>> + xhci_cleanup_halted_endpoint(xhci, >>> + slot_id, ep_index, ep_ring->stream_id, >>> + td, event_trb); >>> + } >>> case EP_STATE_STOPPED: >>> case EP_STATE_RUNNING: >>> break; >>> >>> >>> I think this has a higher probability of being correct. Class driver >>> might not queue any URB to a particular after the first Stall, so why >>> would be move the endpoint away from EP_STATE_HALTED prematurely ? >>> >>> What do you think ? >>> >> >> Currently we queue a reset endpoint command from the .endpoint_reset >> callback in host, this is far too late and should be moved to when we >> get a STALL event. >> >> xhci needs to reset control endpoints on stall as well [1] >> >> I got a testpatch for this, but the more I look into how we handle >> reset endpoint for clearing halts, stop endpoint for urb dequeue, and >> reset device, the more I notice that there are several other cases >> that needs fixing. testpatch for the halted ep is here: >> >> https://git.kernel.org/cgit/linux/kernel/git/mnyman/xhci.git/commit/?h=ep_reset_halt_test&id=fe43d559e0816f65e5373e863a7da8062d311cd7 > > this works too. Can you just make sure to Cc: stable for v3.14+ or > older, if you know which is the earliest kernel that needs this fix. > > You also want to update the comments on that function so they match > what's done now. Sure, will do. I'll remove the queueing of a reset endpoint command in the .endpoint_reset callback as well. The comment for xhci_endpoint_reset says its called in_interrupt context? Does the usb core really call the .endpoint_reset in interrupt context? I tried to follow the codepaths in the usb core but couln't figure it out -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html