On Mon, 13 Jan 2014, Felipe Balbi wrote: > Hi, > > On Mon, Jan 13, 2014 at 03:20:31PM -0500, Alan Ott wrote: > > I have an EG20T-based board and have some issues with performance on > > the USB device interface. > > I don't have that hardware but ... > > > I made a libusb test program (using the async interface)[0] to read > > data from the EG20T's USB device port which has the gadget zero > > source/sink function bound. In theory, one would hope this would give > > the fastest real-world results for the hardware connected. > > > > The test program submits 32 IN transfers and re-submits on transfer > > completion, counting received packets. > > > > From running my test program for a few minutes I get the following: > > elapsed: 548.468416 seconds > > packets: 21503334 > > packets/sec: 39206.148199 > > bytes/sec: 20073547.877732 > > MBit/sec: 160.588383 > > > > 160MBit/sec isn't terrible, but I hoped for better. A USB analyzer > > shows 7 transactions happening quickly (with about 14us separating > > them), but every 8th transaction, the EG20T will NAK between 20-80 > > times[1], losing 50-100us[2]. > > as Alan stated, this is a problem on the device side. The device is > replying with NAK because, I believe, it has ran out of free TDs. > > > This delay happens every 8th transaction without fail[3]. > > > > I've looked at the following: > > 1. The f_sourcesink.c function it queues up 8 responses at the > > beginning. Changing this number up or down had no effect. > > 2. Analysis of pch_udc.c doesn't show anything which would obviously > > cause a delay every 8th packet. > > 3. f_eem seems to have roughly the same performance with ping -f -s > > 64000 (160Mbit/sec). > > > > The CPU load of the gadget-side Atom PC sits very close to zero. > > > > System Details: > > Linux 3.13.0-rc7 (With a defconfig from Yocto for Intel Crownbay) > > Intel Atom E680 with EG20T > > > > I seem to have eliminated everything on the host side, since the host > > is asking for data, and the device is saying it doesn't have any for > > up to 100us at a time. > > > > What am I missing? > > you should probably profile your pch_udc_pcd_queue() to figure out if > there's anything wasting a lot of time there. > > Unlike Alan, I would use trace_printk() rather than pr_debug() since > trace_printk() is of much lower overhead. Google around and you'll see > how to use trace_printk() and how to use the kernel function profiler. By the way, isn't it true that f_sourcesink uses only one request for each bulk endpoint? That would naturally lead to a delay each time the request completes and has to be resubmitted. If the driver used two requests instead, the pipeline would be much less likely to empty out. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html