Re: USB protocol help (STALL and NAKs)

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Fri, 25 Mar 2011 11:04:33 -0400 (EDT)

On Thu, 24 Mar 2011, Arvid Brodin wrote:

> Hi,
> 
> I'm working on the isp1760 driver (mostly modifying the qtd queueing to get rid
> of BUG() calls in interrupt context).
> 
> I have a high-speed USB-stick, that (probably due to some protocol error of
> mine) STALLs when I transfer a 15 MB file to it (after first transferring a few
> MB successfully with repeated 512 B OUT, NYET, PINGs etc.). After the stall is
> received, the host immediately sends a "Device request: Clear feature:
> Endpoint halt" which succeeds. After that, the host continues with IN
> transactions, but the device NAKs these indefinitely and the bus hangs (and I

Which bus hangs?

> get a softlockup BUG after 120 s). This behaviour was detected using a USB
> analyzer.

The block layer is supposed to time out after 30 seconds, causing the 
IN transfers to be unlinked and the USB stick to be reset.  Maybe your 
bus problems prevent this from happening.

> When I try this stick on my desktop EHCI, I never get the STALL, but lots of
> NAKs on IN bulk packets. The stick works fine here.
> 
> a) Should there be some kind of limit/quench on bulk IN NAKs somehow, so that a
>    (malicious/erroneous) device cannot hang the USB subsystem like this? The
>    EHCI driver loads the NakCnt field with 4 (EHCI_TUNE_RL_HS), but when 4 NAKs
>    have been detected and the HC returns the packet, I believe it's just reset
>    and enqueued again?

The host controller is supposed to reload the NakCnt field only when
the async schedule is restarted.  If the NakCnt fields in all the
active endpoints remain 0 for two passes through the async schedule,
the controller is supposed to detect that the schedule is empty and go
to the async sleeping state, after which it restarts the async schedule
about 10 us later.  See section 4.9 and 4.8.3 - 4.8.6 in the EHCI spec.

> b) My host controller driver returns urb->status = -EPIPE (-32) to the usb core
>    after receiving the STALL packet. I'm guessing this is correct since usb core
>    then sends the Clear Endpoint Halt command afterwards. Am I right in this?

Yes.

> c) Clearly, continued bulk IN requests is not the right thing to do after this.

Why not?  It often _is_ the right thing to do.

>    Any ideas why this happens?

What what happens?  Why does the device send a STALL?  I'd have to see 
a usbmon or bus analyzer trace to answer that.

> I'm pretty much out of ideas on this now.
>    Alternatively, I've done something wrong to cause the STALL in the first
>    place, which puts the host/device in some very unfortunate state - what could
>    I have done to cause this? (Some problem with ping or toggle state?)

No.  Without more information, there's no way to know the cause.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html