Re: Video corruption varies by system load

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Wed, 10 Jul 2013 10:48:21 -0400 (EDT)

On Tue, 9 Jul 2013, Devin Heitmueller wrote:

> So I hooked up the video and wrote a bit of Perl to parse the ISOC
> stream and render the underlying video frames.  I can see definitively
> that the video returned from the device contains the corruption.  This
> rules out any sort of DMA or memory related issue (proving that the
> data is not being mangled by the host on receipt).
> 
> Now that I have the raw USB trace though including timing data, I
> started looking at the actual underlying ISOC traffic at the time of
> the corruption, and found something interesting:  Despite having five
> URBs queued at all times with an interval of 1, there are cases where
> the URB isn't being sent.  The corruption consistently follows one of
> these intervals where a URB was skipped.  We're expecting the host
> controller to request to pull the buffer every 125us, and in instances
> where the corruption is exhibited immediately follow a 250us gap
> between URBs.
> 
> See attached screenshot:
> 
> http://devinheitmueller.com/isoc_loss.png
> 
> Packet 27082 is the packet that contains the corruption.  The previous
> URB was received exactly 250us prior (whereas it should have been only
> 125us).  349.594 - 349.344 = 250.
> 
> I suspect the FIFO is overflowing on the chip as a result of the host
> controller not asking for the buffer when it's supposed to.  It's
> worth mentioning that the "corrupt bytes" are actually also found
> several packets later in the correct place, suggesting the chip is
> probably employing some sort of circular buffer which is wrapping
> around.
> 
> So should I be digging into the EHCI URB scheduling code?  Any
> suggestions on where else I should be poking around would be very
> welcome.

Digging into the scheduling code probably won't help much.  However you 
could try collecting a usbmon trace (see Documentation/usb/usbmon.txt).  
This would clearly show the timing of URB submissions and completions.

> I'll be the first to admit that this isn't my particular area of
> expertise - so if I've made some stupid assumption about the expected
> behavior for the URB timing on the bus, don't hesitate to point that
> out.

The transfers should occur regularly at 1-microframe intervals.  If
they don't then something is wrong somewhere.  But I think the problem
is more likely to lie in the upper-level driver than in ehci-hcd.  
You're using the em28xx driver?

At first glance, there is one obvious bug in that driver (probably not 
at all related to your problem, though).  The em28xx_irq_callback() 
routine should not set urb->status.

I bet the problem is related to the usage of the URB_ISO_ASAP flag.  
em28xx_alloc_urbs() sets URB_ISO_ASAP in urb->transfer_flags, and the
value never gets cleared.  In fact, that flag bit is supposed to be set
only in the first URB of a stream, not in the following URBs.  (The
same mistake is present for the URBs in the audio stream.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html