Re: xhci: "no room on ep ring" with USB2.0 high speed device

Kruno Mrak <kruno.mrak@xxxxxxxxxxxxxxxx> · Thu, 21 Jul 2011 18:12:42 +0200

Sarah,
first of all, thank you for your cooperation.

Am 20.07.2011 19:31, schrieb Sarah Sharp:
On Wed, Jul 20, 2011 at 06:04:45PM +0200, Kruno Mrak wrote:
Am 19.07.2011 18:17, schrieb Sarah Sharp:
On Fri, Jul 15, 2011 at 04:42:44PM +0800, Andiry Xu wrote:
On Fri, Jul 15, 2011 at 3:58 PM, Kruno Mrak<kruno.mrak@xxxxxxxxxxxxxxxx>   wrote:
Hi,

when running our cameras on USB3.0 ports we get
"ERROR no room on ep ring".
This only happens if camera's image size is bigger than
~1MB or when using multiple image buffers (queueing).
Capturing is made through libusb and async bulk transfers.
What standard is your camera using?
This camera does not comply to a specific standard. Our products are
for machine
vision industry and we provide our customers a programming API
covering all of our products.
I see.

   The draft USB-AV standard?
Never heard of that. What's this?
It's a draft standard being developed in the USB-IF video class work
group.  If your company is a member of the USB-IF, you can participate
in the development.  They're trying to make the video class more
generic, define protocols for compressed and uncompressed data
(especially useful with the larger USB 3.0 bus bandwidth) and allow the
host and device to swap video source/sink roles.

   Other
than that, I don't know of any camera standards that use bulk endpoints.
Are you submitting multiple bulk transfers to the same endpoint, or are
you submitting one bulk transfer and waiting on it?
At "continuous capture start" one thread is firing multiple bulk
read requests to a single
endpoint. Another thread is waiting for finished transfer. Then
image processing
takes place on returned transfer. And finally, the transfers are
re-submitted.
Each request represents an uncompressed image from the camera, up to 5 MB.
The amount of multiple requests may vary and depends on image post
processing.
By default, we use 4.
Yes, ok, that would certainly fill up an endpoint ring.  Do you
re-submit each transfer after it is completed, or re-submit them all at
once?

Each transfer is re-submitted immediately after image post processing
has completed.

Now, as more and more of our customers are using PCIe-USB3.0 extension
cards we get under pressure.
After looking into xhci driver sources, i have seen that
TRBS_PER_SEGMENT is limited to 64 and segments per ep ring
is fixed to 1. (right?)
For bulk transfer, it's 1 segment. Isochronous transfer needs a bigger ring
because it inserts multiple TDs to the ring when an isoc URB is submitted.
Currently it uses 8 segments.
Yes, Andiry's right.  We never had a clear case for why people needed
bigger bulk rings.  VmWare did complain a bit that some of their test
cases didn't run properly for very large transfers, but I think they
didn't have any customers that actually used transfers that big.
We receive raw-data (bayer-mosaic) images from camera.
So, is there a limit specified on the size of an bulk ring?
No theoretical limit, it's just the code doesn't expand the rings as
necessary.

Aren't bulk endpoints used for a large amount of data transfer?
Yes, but most in-kernel applications submit one bulk transfer at a time,
since the overhead to re-submit is smaller than if you're going through
userspace.

I just wonder, if we are the only one who use bulk transfers and
such a large data amount.
You're the first real application I've heard of. :)

Playing with this params (TRBS_PER_SEGMENT=256 and ep ring segments=8)
and first tests led to good results.
I would like to keep the TRBS_PER_SEGMENT parameter the same, because
otherwise I might to go change the streams code that uses a radix tree to
map DMA addresses to virtual addresses.  If you keep that to 64, how
many ring segments do you end up needing?
20 segments is the minimum, then we are able to receive complete transfers.
Yeah, ok, it seems like we really need dynamic endpoint ring expansion
for that case.

So, my question is, do you agree with me that limited
ep ring allocation might cause above mentioned error.
Is there a chance to push development on ep ring allocation?

Thank you in advance.

Maybe dynamic ring allocation helps in this situation.
Yes, it could help in this situation.  However, I don't have the
bandwidth right now to work on dynamic ring expansion.  The quick fix
would be to add a module parameter to the xHCI driver that sets the
number of ring segments to use for the four different types of
endpoints.  Perhaps parameters named bulk_segs, intr_segs, isoc_segs,
and ctrl_segs?
This could be a quick (and preliminary) fix.
Ok, I'll get you a patch to test tomorrow.

It sounds like you already know how to modify the xHCI driver, so do
you want to create a patch for this?
I am not experienced in committing patches to open source world,
and unfortunately, i am fighting with bandwith problems on xhci and
our cameras, too.
I would prefer, to understand all the problems first, before
creating any patch.

The bandwith problem seems really curious.
We can't reach maximum frame rate from camera on USB3.0 port.
Is this with a USB 2.0 device under an xHCI (USB 3.0) port?  Or a USB
3.0 device?

It's a USB2.0 high speed device attached to USB3.0 port

It does not exist all the time and there are no warnings and no error logs.
Any advice, how to investigate this problem?
This is still using bulk transfers, right?  They're asynchronously
scheduled, so you don't have any guarantees about when they're sent out.
The xHCI host controller actually does the scheduling, so you may just
be running to host controller bugs.

Oh, and you do know that your buffers are getting copied (possibly
twice) when they go through the libusb/usbfs kernel-userspace interface?
I've been meaning to fix it to use usb_alloc_buffer or maybe pin the
user pages, but haven't gotten around to it.  However, you would see the
same latency while using the device under EHCI if that were the real
problem.

When attaching our USB2.0 camera to USB2.0 port, we never
had bandwith problems.

Let me explain two scenarios i have monitored today:

scenario #1:
- loading xhci kernelmodule and our camera is not attached to
  USB3.0 port
- then attaching camera to USB3.0 port
==> descriptor read out shows the full speed descriptor of our camera

scenario #2:
- attaching camera to USB3.0 port without xhci kernelmodule
  beeing loaded before
- then loading xhci kernelmodule
==> descriptor read out shows the high speed descriptor of our camera

scenario #1 is the case, when we can't reach full frame rate.
scenario #2 no bandwith problems

On USB2.0 port, i have not noticed such a behaviour.
I don't understand yet, how full/high speed configuration is 
processed/negotiated
when attaching USB device to host controller.
I will have to read the spec :-( and discuss it with the person
who developed the camera's firmware.
If you have any advice, please let me know.

Also, is your camera already available commercially (USB-IF certified
and all that), or will it be soon?  If many Linux users are going to run
into this issue, the xHCI driver really needs dynamic ring expansion.
If we have some time before your product hits the market and your
customers are just testing out your product, I think the module
parameter would be an acceptable quick fix.
Our camera is out on market since 2007 and we have sold several thousands
of them ( for both, Linux and Windows users). I suspect that more and more
of our customers will complain in near future.
Ok, good to know.  I'll work on that patch to help out your existing
Linux customers.
This would be great.

Kruno Mrak

MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler
Registergericht: Amtsgericht Stuttgart, HRB 271090
Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner, Erhard Meier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html