On 12/12/2012 06:52 PM, Sarah Sharp wrote: > On Wed, Dec 12, 2012 at 12:47:24PM +0100, Javier Martinez Canillas wrote: >> Hello, >> >> We have an issue when trying to use USB cameras on a particular machine using >> the latest mainline Linux 3.7 kernel. This is not a regression since the same >> issue is present with older kernels (i.e: 3.5). >> >> The cameras work fine when plugged to an USB2.0 port (using the EHCI HCD host >> controller driver) but they don't when using the USB3.0 port (using the xHCI >> HCD host controller driver). >> >> The machine's USB3.0 host controller is a NEC Corporation uPD720200 USB 3.0 Host >> Controller (rev 04). >> >> When enabling trace on the uvcvideo driver I see that most frames are lost: >> >> Dec 12 11:07:58 thinclient kernel: [ 4965.597637] uvcvideo: USB isochronous >> frame lost (-18). >> Dec 12 11:07:58 thinclient kernel: [ 4965.597642] uvcvideo: USB isochronous >> frame lost (-18). >> Dec 12 11:07:58 thinclient kernel: [ 4965.597647] uvcvideo: Marking buffer as >> bad (error bit set). >> Dec 12 11:07:58 thinclient kernel: [ 4965.597651] uvcvideo: Frame complete (EOF >> found). >> Dec 12 11:07:58 thinclient kernel: [ 4965.597655] uvcvideo: EOF in empty payload. >> Dec 12 11:07:58 thinclient kernel: [ 4965.597661] uvcvideo: Dropping payload >> (out of sync). >> Dec 12 11:07:58 thinclient kernel: [ 4965.813294] uvcvideo: frame 486 stats: >> 0/2/8 packets, 0/0/8 pts >> >> The uvcvideo checks if urb->iso_frame_desc[i].status < 0 on the >> uvc_video_decode_isoc() function (drivers/media/usb/uvc/uvc_video.c). >> >> I checked on the xhci driver and the only place where this error code (-EXDEV) >> is assigned to frame->status is inside the skip_isoc_td() function >> (drivers/usb/host/xhci-ring.c). >> >> At this point I'm not sure if this is a bug on the xhci driver, another quirk >> needed by the XHCI_NEC_HOST, a camera misconfiguration on the USB Video Class >> driver or a firmware/hardware bug. > > It's a known performance issue, although it's not clear whether it's on > the xHCI driver side or the host controller side. When an interface > setting is enabled where the isochronous endpoint requires two > additional transfers per service interval, the NEC host controller > starts reporting many missed service intervals. The xHCI driver then > finds all the frame buffers that were skipped and marks them with the > -EXDEV status. > > An error status of Missed Service Interval means the host controller > could not access the transfer memory fast enough through the PCI bus to > service the endpoint in time. It could be a host hardware issue, or it > could be software slowing down the system to a crawl. I lean towards a > software issue since, as you said, the Windows driver works fine. > (Although who knows what NEC quirks the Windows driver is working > around...) > > The NEC xHCI host controller is a 0.96 revision, which doesn't support > the Block Event Interrupt (BEI) flag which cuts down on the number of > interrupts per URB submitted. So the xHCI driver's interrupt routine > gets called on every single service interval, rather than being called > once per URB. > > Since the Linux xHCI driver isn't really optimized for performance yet, > the interrupt handler is probably pretty slow and could cause delays in > submitting future URBs. The high amount of interrupts is probably > causing other systems to be starved, possibly leading to the xHCI host > controller not being able to access memory fast enough to service the > endpoint. > Hi Sarah, Thanks for the explanation. Now it makes sense to me and I understand why it works when I decrease either the frame rate or the frame size below certain thresholds. >> The cameras are reported to work on the same machine but using another operating >> system (Windows). > > Windows probably uses Event Data TRBs to cut the interrupts down to one > per URB. It would take a major effort to make the xHCI driver use Event > Data TRBs. > >> I was wondering if you can give me some pointers on how to be sure what's the >> issue or if this rings any bells to you. > > I don't have time to work on performance issues right now, as I have > several other critical bugs (mostly around failed S3/S4). However, if > you want to try to fix this issue yourself, I suggest you run perf and > see where the bottle necks in the xHCI interrupt handler are. > > I suspect that part of it is that the interrupt handler reads the xHCI > status register. That PCI register read is pretty costly, and it's not > necessary since 99% of the time the host controller is going to report > an OK status. And there's no guarantee that when the host does have an > error that it will set a bad status. > > But without an analysis by perf, we won't really know where the > bottlenecks are. > > Sarah Sharp > Ok, I'll try do some performance analysis to figure out where these bottlenecks could be and if I can do anything to improve them. Thanks a lot for your help! Best regards, Javier -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html