On Tue, Jan 21, 2014 at 09:39:12AM -0800, Sarah Sharp wrote: > On Mon, Jan 20, 2014 at 02:45:11PM -0600, Felipe Balbi wrote: > > If the HW reports the endpoint as Halted, there's > > no need to start any URB for that endpoint, we can > > simply return -EPIPE and this will tell upper layers > > that the endpoint is, indeed, halted. > > > > This patch fixes usbtest's set/clear halt test #13 > > on xHCI-based hosts. > > It looks like that test assumes that the xHCI driver should not allow > URBs to be enqueued to the endpoint when the driver needs to call > usb_clear_halt(). Or is it expecting that the host will attempt the > transfer, and the device will stall the transfer? I suppose it doesn't make a difference, considering it wait_for_completion_interruptible(). As long as the transfer actually completes within 10 seconds, it doesn't matter. What I have seen, though, is the transfer never completes and the 10 second timeout always kicks in when trying two consecutive usb_submit_urb(). Here's, in a nut-shell, what happens: -> SetFeature(ENDPOINT_HALT) -> GetStatus() -> usb_submit_urb() -> this one completes with -EPIPE as it should -> usb_submit_urb() -> this one always times out :-( > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Felipe Balbi <balbi@xxxxxx> > > --- > > > > Hi Sarah, > > > > seems like this has been broken for quite a long > > time. > > > > I tested the patch on an AM437x which has XHCI Host and > > the same IP configured as device as well (two dwc3 instances, > > basically). > > > > It has been running for the past hour or so without any failures > > when running testusb -t 13 -c 5000 -s 2048 -a in a loop, but > > if you think there's a better way to fix this, let me know. I just > > didn't understand why xhci-hcd still queues the URB even though > > HW already told us the endpoint is halted. Any comments ? > > The reason the driver still queues URBs is because the xHC halts the > endpoint for events that are outside of stalls. For instance, it will > halt the endpoint on a babble error, a transfer error, or a split > transaction error. The upper layers don't call usb_clear_halt() for > those cases, so the xHCI driver handles the halted endpoint internally. > Look for calls to xhci_requires_manual_halt_cleanup(). alright. > The xHC also halts the endpoint on a control transfer stall. The upper > layers don't ever call usb_clear_halt on the control endpoint, because > the bus spec says the next control transfer will clear the halted > condition. Therefore the xHCI driver has to handle the halt manually. > > The way the driver does that is to set a flag for the endpoint > (EP_HALTED), issue a Reset Endpoint command, and then a Set TR Dequeue > command to move the dequeue pointer past the transfer that failed. The > driver still accepts URBs submitted to the endpoint (because from the > upper layer's perspective, we should in the cases where we manually > handle the halt), but it does not ring the doorbell for that endpoint. > Once the command to set the dequeue pointer completes, the EP_HALTED > flag is cleared, the doorbell is rung for the endpoint, and the queued > URBs are processed. > > The end result is that the xHCI driver *needs* to allow URBs to be > queued when the endpoint is halted and it needs to manual cleanup the > ring. Otherwise, the driver might queue an URB while the two commands > are still being processed, the driver will reject the URBs, and the > behavior in these corner cases will be different than other host > controllers. > > So, no, I don't think this is quite the right solution. > > You could try to differentiate between an endpoint halt that requires a > manual cleanup, and ones that the upper layer should handle, perhaps by > adding a new EP_HALTED_MANUAL flag. The driver could accept URBs for > the manual cleanup case, and reject URBs with -EPIPE for the ones the > upper layers should handle. > > Code that refuses to ring the endpoint doorbell would have to look for > both EP_HALTED and EP_HALTED_MANUAL. There might be other implications > as well; grep for EP_HALTED and see what code checks it. will do. So what's supposed to actually happen with that test ? I'd expect the URB to actually be started (meaning we should ring the ep doorbell ?) and the device would reply with a Stall which, in turn, would return -EPIPE to upper layers. Currently, that URB seems like it's never started to begin with, so we never see a Stall from the device side and the test fails. Surprisingly, this only happens on the second usb_submit_urb(). From what I could gather yesterday, EP_STATE_HALTED isn't yet set when the first usb_submit_urb() is called. Likely that because of that, the doorbell is rung and the device has a chance to Stall the transfer. Or am I missing something ? -- balbi
Attachment:
signature.asc
Description: Digital signature