Re: EG20T USB Gadget Performance

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 13 Jan 2014 21:01:25 -0500 (EST)

On Mon, 13 Jan 2014, Felipe Balbi wrote:

> Hi,
> 
> On Mon, Jan 13, 2014 at 03:20:31PM -0500, Alan Ott wrote:
> > I have an EG20T-based board and have some issues with performance on
> > the USB device interface.
> 
> I don't have that hardware but ...
> 
> > I made a libusb test program (using the async interface)[0] to read
> > data from the EG20T's USB device port which has the gadget zero
> > source/sink function bound. In theory, one would hope this would give
> > the fastest real-world results for the hardware connected.
> > 
> > The test program submits 32 IN transfers and re-submits on transfer
> > completion, counting received packets.
> > 
> > From running my test program for a few minutes I get the following:
> >     elapsed: 548.468416 seconds
> >     packets: 21503334
> >     packets/sec: 39206.148199
> >     bytes/sec: 20073547.877732
> >     MBit/sec: 160.588383
> > 
> > 160MBit/sec isn't terrible, but I hoped for better. A USB analyzer
> > shows 7 transactions happening quickly (with about 14us separating
> > them), but every 8th transaction, the EG20T will NAK between 20-80
> > times[1], losing 50-100us[2].
> 
> as Alan stated, this is a problem on the device side. The device is
> replying with NAK because, I believe, it has ran out of free TDs.
> 
> > This delay happens every 8th transaction without fail[3].
> > 
> > I've looked at the following:
> > 1. The f_sourcesink.c function it queues up 8 responses at the
> > beginning. Changing this number up or down had no effect.
> > 2. Analysis of pch_udc.c doesn't show anything which would obviously
> > cause a delay every 8th packet.
> > 3. f_eem seems to have roughly the same performance with ping -f -s
> > 64000 (160Mbit/sec).
> > 
> > The CPU load of the gadget-side Atom PC sits very close to zero.
> > 
> > System Details:
> >     Linux 3.13.0-rc7 (With a defconfig from Yocto for Intel Crownbay)
> >     Intel Atom E680 with EG20T
> > 
> > I seem to have eliminated everything on the host side, since the host
> > is asking for data, and the device is saying it doesn't have any for
> > up to 100us at a time.
> > 
> > What am I missing?
> 
> you should probably profile your pch_udc_pcd_queue() to figure out if
> there's anything wasting a lot of time there.
> 
> Unlike Alan, I would use trace_printk() rather than pr_debug() since
> trace_printk() is of much lower overhead. Google around and you'll see
> how to use trace_printk() and how to use the kernel function profiler.

By the way, isn't it true that f_sourcesink uses only one request for 
each bulk endpoint?  That would naturally lead to a delay each time the 
request completes and has to be resubmitted.

If the driver used two requests instead, the pipeline would be much
less likely to empty out.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html