On Mon, Jan 13, 2014 at 11:25:16PM -0500, Alan Ott wrote: > On 01/13/2014 09:01 PM, Alan Stern wrote: > >On Mon, 13 Jan 2014, Felipe Balbi wrote: > > > >>Hi, > >> > >>On Mon, Jan 13, 2014 at 03:20:31PM -0500, Alan Ott wrote: > >>>I have an EG20T-based board and have some issues with performance on > >>>the USB device interface. > >>I don't have that hardware but ... > >> > >>>I made a libusb test program (using the async interface)[0] to read > >>>data from the EG20T's USB device port which has the gadget zero > >>>source/sink function bound. In theory, one would hope this would give > >>>the fastest real-world results for the hardware connected. > >>> > >>>The test program submits 32 IN transfers and re-submits on transfer > >>>completion, counting received packets. > >>> > >>> From running my test program for a few minutes I get the following: > >>> elapsed: 548.468416 seconds > >>> packets: 21503334 > >>> packets/sec: 39206.148199 > >>> bytes/sec: 20073547.877732 > >>> MBit/sec: 160.588383 > >>> > >>>160MBit/sec isn't terrible, but I hoped for better. A USB analyzer > >>>shows 7 transactions happening quickly (with about 14us separating > >>>them), but every 8th transaction, the EG20T will NAK between 20-80 > >>>times[1], losing 50-100us[2]. > >>as Alan stated, this is a problem on the device side. The device is > >>replying with NAK because, I believe, it has ran out of free TDs. > >> > >>>This delay happens every 8th transaction without fail[3]. > >>> > >>>I've looked at the following: > >>>1. The f_sourcesink.c function it queues up 8 responses at the > >>>beginning. Changing this number up or down had no effect. > >>>2. Analysis of pch_udc.c doesn't show anything which would obviously > >>>cause a delay every 8th packet. > >>>3. f_eem seems to have roughly the same performance with ping -f -s > >>>64000 (160Mbit/sec). > >>> > >>>The CPU load of the gadget-side Atom PC sits very close to zero. > >>> > >>>System Details: > >>> Linux 3.13.0-rc7 (With a defconfig from Yocto for Intel Crownbay) > >>> Intel Atom E680 with EG20T > >>> > >>>I seem to have eliminated everything on the host side, since the host > >>>is asking for data, and the device is saying it doesn't have any for > >>>up to 100us at a time. > >>> > >>>What am I missing? > >>you should probably profile your pch_udc_pcd_queue() to figure out if > >>there's anything wasting a lot of time there. > >> > >>Unlike Alan, I would use trace_printk() rather than pr_debug() since > >>trace_printk() is of much lower overhead. Google around and you'll see > >>how to use trace_printk() and how to use the kernel function profiler. > >By the way, isn't it true that f_sourcesink uses only one request for > >each bulk endpoint? That would naturally lead to a delay each time the > >request completes and has to be resubmitted. > > That's what the comment at the top of the file says, but it doesn't > appear to be true. See source_sink_start_ep(). It seems to start by > queueing up 8 transactions. I've adjusted this number up with no > effect (currently at 64). right, the comment needs to be updated with what source_sink_start_ep() actually does. > >If the driver used two requests instead, the pipeline would be much > >less likely to empty out. > > Yes, I absolutely agree, the queue must be kept full, but in this > case I think it is. I'll shoot in the dark here and assume current pch_udc only starts request N after N-1 has been givenback, and that's probably what's causing the extra delay. Just as a debugging effort, can you move the call to giveback to a workqueue or something like that just so it gets scheduled into the future ? This wouldn't be an acceptable patch, but just to see if my statement is valid. cheers -- balbi
Attachment:
signature.asc
Description: Digital signature