On Mon, 1 Jul 2013, Devin Heitmueller wrote: > Hi all, > > I've been doing some debugging of a video corruption problem in the > em28xx video capture driver, and after a couple of weeks of digging > in, I think I might have exposed some sort race condition in the USB > core. > > http://devinheitmueller.com/out17.png > > I'm submitting URBs of 64 x 3072, I've got five URBs queued at any > given time, and I'm getting highly intermittent lines in the video > (example shown above). The offset for the corruption is inconsistent, > but the actual corruption appears to be a different part of the same > transfer_buffer, and it is always at the beginning of the individual > packet within the URB. I've stripped down the URB handler to the bare > minimum, removing essentially everything except for a simple memcpy() > to the output buffer (and incrementing the output buffer pointer by > the isoc actual_length received). I've also done memset(0) calls for > the URB transfer buffer before submitting, as well as the output > buffer (in an effort to distinguish between writing the wrong data and > not writing any data to the output buffer for the area in question). > And I've checked the math to ensure that the offsets and lengths being > sent to in the usb_submit_urb() call are all what's expected. > > Here's where things get interesting/scary. If I run the following > command while video is streaming, the whole problem magically > disappears: > > while [ 1 ]; do /bin/true; done > > There's something going on in the scheduler which is effecting the > state of the urb->transfer_buffer being sent to the completion > handler. That's weird. But it might not be the scheduler so much; it could be related more to the total CPU load. > The problem has been seen on both the stock EHCI driver (on x86) as > well as the musb driver used on the TI Davinci platform (ARM). The > transfer buffer itself is being allocated using usb_alloc_coherent(), > and I've seen it when allocating with vmalloc() as well. Do you mean kmalloc()? Memory allocated with vmalloc() is generally not suitable for DMA mapping. > This feels like some sort of DMA or cache related issue, since the > behavior of the URB completion handler itself appears completely > consistent regardless of the system load. I'm seeing the issue on > 3.10-rc6 all the way back to 2.6.31 (the earliest I can go on my > Ubuntu box given some udev related dependencies). > > I've done plenty of work on USB drivers under Linux over the years, > but haven't dug too much into the USB core. Anybody who has any > suggestions on how to debug such a timing problem, such suggestions > would be very welcomed. This is an interesting problem, but I don't think you'll get much insight from looking at the USB side of things. You could try asking the people in charge of the DMA- and cache-related parts of the kernel. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html