Hi, On Thu, Nov 13, 2014 at 01:16:34PM +0200, Mathias Nyman wrote: > On 13.11.2014 00:28, Felipe Balbi wrote: > > Hi, > > > > On Wed, Nov 12, 2014 at 04:54:06PM -0500, Alan Stern wrote: > > > > [...] > > > >>> and the doorbell will never rung. But even if I drop EP_HALTED from the > >>> check below and let EP doorbell be rung, nothing will happen because, > >>> according to XHCI spec 1.0, we *must* first issue a Reset Endpoint > >>> command to get Endpoint to move to EP_STATE_STOPPED, in which case, > >>> ringing that EP's doorbell will actually cause a transfer to happen. > >>> > >>> Right now, what happens is that second usb_submit_urb() does nothing and > >>> the 10 second timer expires, causing the URB to be canceled and test > >>> failing with -ETIMEDOUT. > >> > >> Okay, I see. What usbcore and usbtest expect to happen is this: > >> > >> (1) An URB fails because the endpoint sent a STALL. The completion > >> routine is called with status -EPIPE. > >> > >> (2) When the completion routine returns, the next URB in the > >> endpoint's queue should get handled by the hardware. If the > >> endpoint is still halted, this URB should fail with -EPIPE > >> just like the first. > >> > >> (3) Etc. Eventually the endpoint queue empties out or the class > >> driver calls usb_clear_halt(). > > > > perfect :-) > > > >> So (1) works as desired, but (2) doesn't work because the doorbell > >> never gets rung. > > > > right. > > > >> And the easiest way to make (2) work is to issue a Reset Endpoint > >> command. > > > > There's one extra bit of information here, see below. > > > >> (There are other, more complicated ways to get the same result. For > >> instance, you could loop through the remaining queued URBs, giving them > >> back one by one with -EPIPE. And each time an URB is submitted, you > >> could give it back right away. But Reset Endpoint is simpler.) > > > > IMO issuing a Reset Endpoint is the only correct way of implementing > > this. See, there are two sides to usbtest + g_zero love relationship: > > > > (a) it will help you test your UDC driver. > > (b) it will help you test your HCD driver. > > > > Currently, we have a bug with (b), but if iterate over the list of > > submitted URBs we can create two problems: > > > > (i) There's no way to synchronize class driver's usb_submit_urb() with > > the exact time when we iterate over the list of pending URBs in > > xhci-hcd. Which means that we could very well iterate over an empty list > > and think everything was fine. Heck, this would even happen with > > usbtest, note that simple_io() always waits 10 seconds for a transfer > > completion before returning control to the caller. > > > > (ii) We would fall into the possibility of not catching bugs with UDCs > > running g_zero because HCD would just return early -EPIPE for us without > > asking UDC to handle another transfer :-) > > > > Because of these two cases, I think the *only* way to solve this is by > > issuing a Reset Endpoint cmd so another token can be shifted onto the > > data lines. > > > >> In the patch, you talked about clearing the endpoint halt. But that's > >> not what you want to do; you want to issue a Reset Endpoint command, > >> which affects only the host controller. The endpoint's status in the > > > > this is exactly what xhci_cleanup_halted_endpoint() does. Sure, it's a > > misnamer and should probably be renamed to either > > xhci_reset_endpoint_cmd() or xhci_clear_endpoint_state_halted() to make > > it clearer as to what's happening. But that can happen later, we don't > > need to clean that up in order to fix the bug :-) > > > >> peripheral device will remain unchanged -- no halt will be cleared. > >> That contributed to my confusion on reading the patch. > > > > yeah, that got me for a while too. I had to keep reminding myself what > > xhci_cleanup_halted_endpoint() was doing ;-) > > > >> By the way, does the same sort of thing happen after a transfer > >> error (such as a CRC mismatch)? Does the xHCI controller change the > >> state to EP_STATE_HALTED? Or does it instead go directly to > > > > There are a few conditions in which XHCI will change EP state to > > EP_STATE_HALTED, one of them is a STALL token from the peripheral and > > the others would be really error conditions: Babble, CRC error, etc. > > > > The spec even has a note about it, which I quote: > > > > " > > A Halt condition or USB Transaction error detected on a USB pipe > > shall cause a Running Endpoint to transition to the Halted > > state. A Reset Endpoint Command shall be used to clear the Halt > > condition on the endpoint and transition the endpoint to the > > Stopped state. A Stop Endpoint Command received while an > > endpoint is in the Halted state shall have no effect and shall > > generate a Command Completion Event with the Completion Code set > > to Context State Error. > > " > > > >> EP_STATE_STOPPED? You probably want to treat that case and the STALL > >> case as similarly as possible. > > > > There's already code to deal with that, take a look at > > xhci_requires_manual_halt_cleanup() and its callers: > > > > 1754 static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, > > 1755 struct xhci_ep_ctx *ep_ctx, > > 1756 unsigned int trb_comp_code) > > 1757 { > > 1758 /* TRB completion codes that may require a manual halt cleanup */ > > 1759 if (trb_comp_code == COMP_TX_ERR || > > 1760 trb_comp_code == COMP_BABBLE || > > 1761 trb_comp_code == COMP_SPLIT_ERR) > > 1762 /* The 0.96 spec says a babbling control endpoint > > 1763 * is not halted. The 0.96 spec says it is. Some HW > > 1764 * claims to be 0.95 compliant, but it halts the control > > 1765 * endpoint anyway. Check if a babble halted the > > 1766 * endpoint. > > 1767 */ > > 1768 if ((ep_ctx->ep_info & cpu_to_le32(EP_STATE_MASK)) == > > 1769 cpu_to_le32(EP_STATE_HALTED)) > > 1770 return 1; > > 1771 > > 1772 return 0; > > 1773 } > > > > In fact, there is a typo on that comment. It needs this patch: > > > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > > index 4e8c3cf..a8bbacb 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -1759,7 +1759,7 @@ static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, > > if (trb_comp_code == COMP_TX_ERR || > > trb_comp_code == COMP_BABBLE || > > trb_comp_code == COMP_SPLIT_ERR) > > - /* The 0.96 spec says a babbling control endpoint > > + /* The 0.95 spec says a babbling control endpoint > > * is not halted. The 0.96 spec says it is. Some HW > > * claims to be 0.95 compliant, but it halts the control > > * endpoint anyway. Check if a babble halted the > > > >>>> For instance, if an endpoint is halted then there's no reason for the > >>>> controller to shift any USB tokens for it onto the data lines. Doing > >>>> so would just be a waste of bandwidth, since the response is bound to > >>>> be another STALL. And it doesn't matter that the peripheral has no > >>>> means to STALL any follow-up iens, since the host controller already > >>>> knows the endpoint is halted. > >>> > >>> Now you're claiming that this is a bug on usbtest which has been in tree > >>> for many, many years and is known to work with EHCI, MUSB and UHCI (at > >>> least, probably dummy too), which is a different statement from previous > >>> thread [1]. > >> > >> No, I simply failed to understood what you wanted to do. > > > > alright. > > > >>>> The comment in the patch talks about moving the dequeue pointer past > >>>> the STALLed TD and then clearing the halt condition. Moving the > >>>> dequeue pointer is fine -- there's no other way to take control of the > >>>> TD back from the hardware -- but why would you want to clear the halt? > >>>> The HCD isn't supposed to do that; the class driver is. > >>> > >>> See what usbtest does. It wants to make sure that, even if we issue > >>> several URBs for that endpoint, the function will always STALL. Sure, > >>> it's a waste of bandwidth, but what's the probability that any class > >>> driver will actually do this outside of a test environment ? I think > >>> it's not up to the HCD to device and it should, rather, let the function > >>> respond with the expected STALL again which will, once more, move the > >>> endpoint back into EP_STATE_HALT. > >>> > >>> The only thing we should be discussing here, is proper placement for > >>> xhci_cleanup_halted_endpoint(). > >> > >> Right. In theory you could do it any time up until the completion > >> routine returns. Doing it when you process the failed TD seems like a > >> good choice -- advance the dequeue pointer and issue the command at the > >> same time. > > > > My concern here is that this will happen when the first usb_submit_urb() > > completes. I wonder if moving this to when the following > > usb_submit_urb() is about to start would be better ? > > > > Something like below (it won't build, just trying to illustrate the > > situation): > > > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > > index 4e8c3cf..a1dac2c 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -2814,6 +2814,12 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring, > > return -EINVAL; > > case EP_STATE_HALTED: > > xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n"); > > + if (!usb_endpoint_xfer_control(&td->urb->ep->desc)) { > > + xhci_dbg(xhci, "Resetting endpoint.\n"); > > + xhci_cleanup_halted_endpoint(xhci, > > + slot_id, ep_index, ep_ring->stream_id, > > + td, event_trb); > > + } > > case EP_STATE_STOPPED: > > case EP_STATE_RUNNING: > > break; > > > > > > I think this has a higher probability of being correct. Class driver > > might not queue any URB to a particular after the first Stall, so why > > would be move the endpoint away from EP_STATE_HALTED prematurely ? > > > > What do you think ? > > > > Currently we queue a reset endpoint command from the .endpoint_reset > callback in host, this is far too late and should be moved to when we > get a STALL event. > > xhci needs to reset control endpoints on stall as well [1] > > I got a testpatch for this, but the more I look into how we handle > reset endpoint for clearing halts, stop endpoint for urb dequeue, and > reset device, the more I notice that there are several other cases > that needs fixing. testpatch for the halted ep is here: > > https://git.kernel.org/cgit/linux/kernel/git/mnyman/xhci.git/commit/?h=ep_reset_halt_test&id=fe43d559e0816f65e5373e863a7da8062d311cd7 this works too. Can you just make sure to Cc: stable for v3.14+ or older, if you know which is the earliest kernel that needs this fix. You also want to update the comments on that function so they match what's done now. > It's hard to see from patch diff itself what it does, but basically we > call xhci_cleanup_halted_endpoint() in finish_td() if the transfer > event status is STALL, or a TX error that requires resetting the > endpoint. > > There are still issues with setting the dequeue pointer correctly > after stop or reset endpoint, I think this is because we try to find > the next TD based on a saved "stopped TD" value that might not valid > anymore (i.e. a reset device in between reset endpoint and set dq > pointer) this issue is seen with DVB tuners when changing channels: > > https://bugzilla.kernel.org/show_bug.cgi?id=75521 alright. Thanks -- balbi
Attachment:
signature.asc
Description: Digital signature