I've been too busy to take part in this discussion (upgrading to Fedora 10 was a disaster because the X server doesn't work on my machine) -- there hasn't even been time enough in the last few months to read through much of the USB 3.0 spec. So I have only a few comments to add... On Tue, 27 Jan 2009, David Brownell wrote: > > The problem is that I saw significant performance improvement with USB > > 3.0 prototypes when I pushed the scatter gather list down to the xHCI > > HCD. The xHCI data structures are just set up in such a way that > > queuing a list of scatter gather entries is just natural. > > That's a discussion we can have more producively when > everyone can see what those xHCI data structures are. ;) > > Are they really that different from EHCI or OHCI? They > support queues too. The generic model is "queue" ... not > scatterlist, which isn't used much outside the block layer. The main difference being how DMA is handled. Also, "queue" is _too_ generic -- it doesn't express the idea that a bunch of buffers may all belong to a single logical transfer. So far we've handled that with the URB_NO_INTERRUPT kludge, but I think a different approach would be better. (For example, URB_NO_INTERRUPT isn't implemented properly in usbfs.) Note: As far as I can see, this notion applies only to bulk transfers. ISO transfers already have a sort-of scatterlist implemented, control transfers are generally limited by HCDs to a single contiguous buffer (will this need to change for USB 3.0?), and interrupt transfers don't usually need to send all that much data. Maybe they could benefit from this idea too, I don't know. In general, I agree with the viewpoint that pushing the scatterlist handling down into the HCDs would be an improvement. Whether this is done by actually using scatterlists (which already exist) or some other data structure (which might be better suited to our needs) is a separate issue. > So for example I've seen individual scatterlists > of nearly a megabyte get sent to EHCI, which works > on them and then issues a single completion IRQ. There are several limitations to the current implementation. The two most notable are: It can't run asynchronously, and it doesn't directly map buffers from userspace. A new implementation will certainly have its own issues. In particular, I'm thinking of the requirement that every buffer in a transfer (except the last) must be a multiple of the maxpacket length. The block layer currently has no way to express this requirement. It can be weakened slightly; for example, with EHCI you're okay if every buffer other than the first begins on a 4-KB boundary and every buffer other than the last ends on a 4-KB boundary. But that can't be expressed in the block layer either. It would be silly to go to a lot of effort to avoid the overhead of allocating and submitting multiple URBs, only to suffer the overhead of bounce buffers. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html