Re: EG20T USB Gadget Performance

Alan Ott <alan@xxxxxxxxxxx> · Mon, 13 Jan 2014 23:25:16 -0500

On 01/13/2014 09:01 PM, Alan Stern wrote:
On Mon, 13 Jan 2014, Felipe Balbi wrote:

Hi,

On Mon, Jan 13, 2014 at 03:20:31PM -0500, Alan Ott wrote:
I have an EG20T-based board and have some issues with performance on
the USB device interface.
I don't have that hardware but ...

I made a libusb test program (using the async interface)[0] to read
data from the EG20T's USB device port which has the gadget zero
source/sink function bound. In theory, one would hope this would give
the fastest real-world results for the hardware connected.

The test program submits 32 IN transfers and re-submits on transfer
completion, counting received packets.

 From running my test program for a few minutes I get the following:
     elapsed: 548.468416 seconds
     packets: 21503334
     packets/sec: 39206.148199
     bytes/sec: 20073547.877732
     MBit/sec: 160.588383

160MBit/sec isn't terrible, but I hoped for better. A USB analyzer
shows 7 transactions happening quickly (with about 14us separating
them), but every 8th transaction, the EG20T will NAK between 20-80
times[1], losing 50-100us[2].
as Alan stated, this is a problem on the device side. The device is
replying with NAK because, I believe, it has ran out of free TDs.

This delay happens every 8th transaction without fail[3].

I've looked at the following:
1. The f_sourcesink.c function it queues up 8 responses at the
beginning. Changing this number up or down had no effect.
2. Analysis of pch_udc.c doesn't show anything which would obviously
cause a delay every 8th packet.
3. f_eem seems to have roughly the same performance with ping -f -s
64000 (160Mbit/sec).

The CPU load of the gadget-side Atom PC sits very close to zero.

System Details:
     Linux 3.13.0-rc7 (With a defconfig from Yocto for Intel Crownbay)
     Intel Atom E680 with EG20T

I seem to have eliminated everything on the host side, since the host
is asking for data, and the device is saying it doesn't have any for
up to 100us at a time.

What am I missing?
you should probably profile your pch_udc_pcd_queue() to figure out if
there's anything wasting a lot of time there.

Unlike Alan, I would use trace_printk() rather than pr_debug() since
trace_printk() is of much lower overhead. Google around and you'll see
how to use trace_printk() and how to use the kernel function profiler.
By the way, isn't it true that f_sourcesink uses only one request for
each bulk endpoint?  That would naturally lead to a delay each time the
request completes and has to be resubmitted.

That's what the comment at the top of the file says, but it doesn't 
appear to be true. See source_sink_start_ep(). It seems to start by 
queueing up 8 transactions.  I've adjusted this number up with no effect 
(currently at 64).

Maybe there's something else I'm missing.

If the driver used two requests instead, the pipeline would be much
less likely to empty out.

Yes, I absolutely agree, the queue must be kept full, but in this case I 
think it is.

Alan.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html