Re: [PATCH] usb: host: xhci: fix HALTED endpoint handling

Felipe Balbi <balbi@xxxxxx> · Tue, 21 Jan 2014 11:55:54 -0600

On Tue, Jan 21, 2014 at 09:39:12AM -0800, Sarah Sharp wrote:
> On Mon, Jan 20, 2014 at 02:45:11PM -0600, Felipe Balbi wrote:
> > If the HW reports the endpoint as Halted, there's
> > no need to start any URB for that endpoint, we can
> > simply return -EPIPE and this will tell upper layers
> > that the endpoint is, indeed, halted.
> > 
> > This patch fixes usbtest's set/clear halt test #13
> > on xHCI-based hosts.
> 
> It looks like that test assumes that the xHCI driver should not allow
> URBs to be enqueued to the endpoint when the driver needs to call
> usb_clear_halt().  Or is it expecting that the host will attempt the
> transfer, and the device will stall the transfer?

I suppose it doesn't make a difference, considering it
wait_for_completion_interruptible(). As long as the transfer actually
completes within 10 seconds, it doesn't matter.

What I have seen, though, is the transfer never completes and the 10
second timeout always kicks in when trying two consecutive
usb_submit_urb().

Here's, in a nut-shell, what happens:

-> SetFeature(ENDPOINT_HALT)
-> GetStatus()
-> usb_submit_urb()
	-> this one completes with -EPIPE as it should
-> usb_submit_urb()
	-> this one always times out :-(

> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Felipe Balbi <balbi@xxxxxx>
> > ---
> > 
> > Hi Sarah,
> > 
> > seems like this has been broken for quite a long
> > time.
> > 
> > I tested the patch on an AM437x which has XHCI Host and
> > the same IP configured as device as well (two dwc3 instances,
> > basically).
> > 
> > It has been running for the past hour or so without any failures
> > when running testusb -t 13 -c 5000 -s 2048 -a in a loop, but
> > if you think there's a better way to fix this, let me know. I just
> > didn't understand why xhci-hcd still queues the URB even though
> > HW already told us the endpoint is halted. Any comments ?
> 
> The reason the driver still queues URBs is because the xHC halts the
> endpoint for events that are outside of stalls.  For instance, it will
> halt the endpoint on a babble error, a transfer error, or a split
> transaction error.  The upper layers don't call usb_clear_halt() for
> those cases, so the xHCI driver handles the halted endpoint internally.
> Look for calls to xhci_requires_manual_halt_cleanup().

alright.

> The xHC also halts the endpoint on a control transfer stall.  The upper
> layers don't ever call usb_clear_halt on the control endpoint, because
> the bus spec says the next control transfer will clear the halted
> condition.  Therefore the xHCI driver has to handle the halt manually.
> 
> The way the driver does that is to set a flag for the endpoint
> (EP_HALTED), issue a Reset Endpoint command, and then a Set TR Dequeue
> command to move the dequeue pointer past the transfer that failed.  The
> driver still accepts URBs submitted to the endpoint (because from the
> upper layer's perspective, we should in the cases where we manually
> handle the halt), but it does not ring the doorbell for that endpoint.
> Once the command to set the dequeue pointer completes, the EP_HALTED
> flag is cleared, the doorbell is rung for the endpoint, and the queued
> URBs are processed.
> 
> The end result is that the xHCI driver *needs* to allow URBs to be
> queued when the endpoint is halted and it needs to manual cleanup the
> ring.  Otherwise, the driver might queue an URB while the two commands
> are still being processed, the driver will reject the URBs, and the
> behavior in these corner cases will be different than other host
> controllers.
> 
> So, no, I don't think this is quite the right solution.
> 
> You could try to differentiate between an endpoint halt that requires a
> manual cleanup, and ones that the upper layer should handle, perhaps by
> adding a new EP_HALTED_MANUAL flag.  The driver could accept URBs for
> the manual cleanup case, and reject URBs with -EPIPE for the ones the
> upper layers should handle.
> 
> Code that refuses to ring the endpoint doorbell would have to look for
> both EP_HALTED and EP_HALTED_MANUAL.  There might be other implications
> as well; grep for EP_HALTED and see what code checks it.

will do.

So what's supposed to actually happen with that test ? I'd expect the
URB to actually be started (meaning we should ring the ep doorbell ?)
and the device would reply with a Stall which, in turn, would return
-EPIPE to upper layers.

Currently, that URB seems like it's never started to begin with, so we
never see a Stall from the device side and the test fails.

Surprisingly, this only happens on the second usb_submit_urb(). From
what I could gather yesterday, EP_STATE_HALTED isn't yet set when the
first usb_submit_urb() is called. Likely that because of that, the
doorbell is rung and the device has a chance to Stall the transfer.

Or am I missing something ?

-- 
balbi
Attachment:
signature.asc

Description: Digital signature