Re: Continuous stream of small bulk transfers hangs on OHCI-based systems

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Sun, 3 Feb 2013 22:12:49 -0500 (EST)

On Sun, 3 Feb 2013, Mark Ferrell wrote:

> I am not certain what it is worth, but I have managed to capture 
> usbmon/1u on a system which lost access to the ports, though it is quite 
> sizable. Right about 3MB/min of data.

Good.

> Here is the last few transactions before the device "appears" to hang, 
> and before attempting kill -9 the controlling application.
> 
> dd4b54c0 3248782520 S Bi:1:003:3 -115 512 <
> dd4b57c0 3248782543 C Bi:1:003:1 0 2 = 3160
> dd4b57c0 3248782554 S Bi:1:003:1 -115 512 <
> dd4b5440 3248783508 C Bi:1:003:3 0 2 = 3160
> dd4b5440 3248783522 S Bi:1:003:3 -115 512 <
> dd4b5740 3248783547 C Bi:1:003:1 0 2 = 3160
> dd4b5740 3248783559 S Bi:1:003:1 -115 512 <
> dd4b54c0 3248784506 C Bi:1:003:3 0 2 = 3160
> dd4b54c0 3248784519 S Bi:1:003:3 -115 512 <
> dd4b57c0 3248784542 C Bi:1:003:1 0 2 = 3160
> dd4b57c0 3248784554 S Bi:1:003:1 -115 512 <
> dd4b5440 3248785513 C Bi:1:003:3 0 2 = 3160
> dd4b5740 3248785556 C Bi:1:003:1 0 2 = 3160
> dd4b5740 3248785568 S Bi:1:003:1 -115 512 <
> dd4b5740 3248868626 C Bi:1:003:1 0 2 = 3160
> dd4b57c0 3248869618 C Bi:1:003:1 0 2 = 3160

The last few lines of the trace show something very strange on endpoint
1.  The URBs at addresses dd4b57c0 and dd4b5740 complete and are
resubmitted, in that order.  But at the very end, they complete in the
opposite order.  Maybe this is an illusion caused by missing data
(usbmon traces can sometimes drop events), but if not then it is a
serious problem.

> The transaction at 3248785513 is the last from 1:003:3 (BulkIn?).  
> The other endpoints on 1:003 continue to operate normally, and we
> have validated that we can still transmit data out through the
> device, we can simply no longer receive.

After 3248785513 there aren't any submissions listed for endpoint 3.  
Either there was a submission and it was lost from the usbmon trace, or 
nothing was submitted.  If the latter is true, it indicates a bug 
somewhere outside the ohci-hcd driver.

In fact I suspect some data got lost, because of the big jump in the 
timestamp values from 3248785568 to 3248868626.  That's a gap of 80 ms, 
and it seems most unlikely that nothing happened during that time 
judging from the rest of the timestamps -- they show events occurring 
every ms.

Maybe you can try again.  If we see a similar pattern, it may indicate 
that something weird is happening outside of the USB stack.  Something 
that would prevent processes from being scheduled during a period of 80 
ms could indeed be a source of problems.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html