Re: EG20T USB Gadget Performance

Felipe Balbi <balbi@xxxxxx> · Tue, 14 Jan 2014 08:13:18 -0600

On Mon, Jan 13, 2014 at 11:25:16PM -0500, Alan Ott wrote:
> On 01/13/2014 09:01 PM, Alan Stern wrote:
> >On Mon, 13 Jan 2014, Felipe Balbi wrote:
> >
> >>Hi,
> >>
> >>On Mon, Jan 13, 2014 at 03:20:31PM -0500, Alan Ott wrote:
> >>>I have an EG20T-based board and have some issues with performance on
> >>>the USB device interface.
> >>I don't have that hardware but ...
> >>
> >>>I made a libusb test program (using the async interface)[0] to read
> >>>data from the EG20T's USB device port which has the gadget zero
> >>>source/sink function bound. In theory, one would hope this would give
> >>>the fastest real-world results for the hardware connected.
> >>>
> >>>The test program submits 32 IN transfers and re-submits on transfer
> >>>completion, counting received packets.
> >>>
> >>> From running my test program for a few minutes I get the following:
> >>>     elapsed: 548.468416 seconds
> >>>     packets: 21503334
> >>>     packets/sec: 39206.148199
> >>>     bytes/sec: 20073547.877732
> >>>     MBit/sec: 160.588383
> >>>
> >>>160MBit/sec isn't terrible, but I hoped for better. A USB analyzer
> >>>shows 7 transactions happening quickly (with about 14us separating
> >>>them), but every 8th transaction, the EG20T will NAK between 20-80
> >>>times[1], losing 50-100us[2].
> >>as Alan stated, this is a problem on the device side. The device is
> >>replying with NAK because, I believe, it has ran out of free TDs.
> >>
> >>>This delay happens every 8th transaction without fail[3].
> >>>
> >>>I've looked at the following:
> >>>1. The f_sourcesink.c function it queues up 8 responses at the
> >>>beginning. Changing this number up or down had no effect.
> >>>2. Analysis of pch_udc.c doesn't show anything which would obviously
> >>>cause a delay every 8th packet.
> >>>3. f_eem seems to have roughly the same performance with ping -f -s
> >>>64000 (160Mbit/sec).
> >>>
> >>>The CPU load of the gadget-side Atom PC sits very close to zero.
> >>>
> >>>System Details:
> >>>     Linux 3.13.0-rc7 (With a defconfig from Yocto for Intel Crownbay)
> >>>     Intel Atom E680 with EG20T
> >>>
> >>>I seem to have eliminated everything on the host side, since the host
> >>>is asking for data, and the device is saying it doesn't have any for
> >>>up to 100us at a time.
> >>>
> >>>What am I missing?
> >>you should probably profile your pch_udc_pcd_queue() to figure out if
> >>there's anything wasting a lot of time there.
> >>
> >>Unlike Alan, I would use trace_printk() rather than pr_debug() since
> >>trace_printk() is of much lower overhead. Google around and you'll see
> >>how to use trace_printk() and how to use the kernel function profiler.
> >By the way, isn't it true that f_sourcesink uses only one request for
> >each bulk endpoint?  That would naturally lead to a delay each time the
> >request completes and has to be resubmitted.
> 
> That's what the comment at the top of the file says, but it doesn't
> appear to be true. See source_sink_start_ep(). It seems to start by
> queueing up 8 transactions.  I've adjusted this number up with no
> effect (currently at 64).

right, the comment needs to be updated with what source_sink_start_ep()
actually does.

> >If the driver used two requests instead, the pipeline would be much
> >less likely to empty out.
> 
> Yes, I absolutely agree, the queue must be kept full, but in this
> case I think it is.

I'll shoot in the dark here and assume current pch_udc only starts
request N after N-1 has been givenback, and that's probably what's
causing the extra delay.

Just as a debugging effort, can you move the call to giveback to a
workqueue or something like that just so it gets scheduled into the
future ? This wouldn't be an acceptable patch, but just to see if my
statement is valid.

cheers

-- 
balbi
Attachment:
signature.asc

Description: Digital signature