Re: Video corruption varies by system load

Devin Heitmueller <dheitmueller@xxxxxxxxxxxxxx> · Tue, 9 Jul 2013 17:56:55 -0400

On Sun, Jul 7, 2013 at 9:39 PM, Devin Heitmueller
<dheitmueller@xxxxxxxxxxxxxx> wrote:
> I finally dug out my Beagle 480 USB, so I will get that hooked up this
> week, write a decoder to reassemble the video frames based on the USB
> trace, and know once and for all whether the device is delivering
> correct video or not.  If the video being delivered by the device has
> no corruption, then we're talking about some sort of memory
> consistency or DMA issue (or perhaps some sort of problem with the USB
> core populating the finished URBs before calling the completion
> handler).  If the video coming down the bus is corrupted, then we're
> probably talking about some sort of timing problem with the URB
> submission (combined with the FIFO on the chip poorly handling the
> incorrect timing).

So I hooked up the video and wrote a bit of Perl to parse the ISOC
stream and render the underlying video frames.  I can see definitively
that the video returned from the device contains the corruption.  This
rules out any sort of DMA or memory related issue (proving that the
data is not being mangled by the host on receipt).

Now that I have the raw USB trace though including timing data, I
started looking at the actual underlying ISOC traffic at the time of
the corruption, and found something interesting:  Despite having five
URBs queued at all times with an interval of 1, there are cases where
the URB isn't being sent.  The corruption consistently follows one of
these intervals where a URB was skipped.  We're expecting the host
controller to request to pull the buffer every 125us, and in instances
where the corruption is exhibited immediately follow a 250us gap
between URBs.

See attached screenshot:

http://devinheitmueller.com/isoc_loss.png

Packet 27082 is the packet that contains the corruption.  The previous
URB was received exactly 250us prior (whereas it should have been only
125us).  349.594 - 349.344 = 250.

I suspect the FIFO is overflowing on the chip as a result of the host
controller not asking for the buffer when it's supposed to.  It's
worth mentioning that the "corrupt bytes" are actually also found
several packets later in the correct place, suggesting the chip is
probably employing some sort of circular buffer which is wrapping
around.

So should I be digging into the EHCI URB scheduling code?  Any
suggestions on where else I should be poking around would be very
welcome.

I'll be the first to admit that this isn't my particular area of
expertise - so if I've made some stupid assumption about the expected
behavior for the URB timing on the bus, don't hesitate to point that
out.

Devin

-- 
Devin J. Heitmueller - Kernel Labs
http://www.kernellabs.com
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html