On Wed, 28 Jan 2009, David Brownell wrote: > > > > The problem is that I saw significant performance improvement with USB > > > > 3.0 prototypes when I pushed the scatter gather list down to the xHCI > > > > HCD. The xHCI data structures are just set up in such a way that > > > > queuing a list of scatter gather entries is just natural. > > > > > > That's a discussion we can have more producively when > > > everyone can see what those xHCI data structures are. ;) > > > > > > Are they really that different from EHCI or OHCI? They > > > support queues too. The generic model is "queue" ... not > > > scatterlist, which isn't used much outside the block layer. > > > > The main difference being how DMA is handled. > > Doesn't answer the question. :) Surely you didn't expect me to tell you how xHCI's data structures differ from EHCI's and OHCI's? :-) > > Also, "queue" is _too_ > > generic -- it doesn't express the idea that a bunch of buffers may all > > belong to a single logical transfer. So far we've handled that with > > the URB_NO_INTERRUPT kludge, but I think a different approach would be > > better. (For example, URB_NO_INTERRUPT isn't implemented properly in > > usbfs.) > > NO_INTERRUPT is a performance hint, in support of interrupt > mitigation, not a semantic dictum. EHCI can do pretty complete > mitigation; OHCI, only partial. > > Recall that it's used with network packet queues too, where > they really must *not* be coupled to packet boundaries. Exactly my point -- it's a performance hint, but what we really need is a way to demarcate transfer (or message) boundaries. Remember your idea that usb_unlink_urb really should cancel all outstanding URBs in an endpoint's queue? That works only when the queue contains a single message. Although that's basically true now for bulk queues, it isn't true for ep0 and it won't be true for bulk in USB 3.0. This issue shows up particularly strongly when using usbfs: If a program submits two 64-KB transfers and libusb has to break each of them up into four 16-KB URBs (because of usbfs's upper limit on URB size), there's no way for the kernel to know that a short packet in the first transfer (i.e., URBs 1-4) should cause it to jump directly to the second transfer (i.e., URB 5). > > Note: As far as I can see, this notion applies only to bulk transfers. > > Which notion ... scatterlists? Scatterlists and/or queues. Two ways of expressing the same idea. > > ISO transfers already have a sort-of scatterlist implemented, > > Restricted to one buffer; each packet boundary is specified. > The issue Sarah raised involves submitting several buffers > (to support bursting) where packet boundaries don't matter. Certainly -- iso is special and has different requirements from bulk. For bursting we will certainly need a way to submit multiple independent transfers, each of which might terminate early because of a short packet. Whether a transfer should be represented by a single URB containing a queue of buffers, or by multiple URBs each containing a single buffer, is what we should decide now. I think the single-URB approach is best, if for no other reason than that an URB is a reasonably hefty data structure and there's no point in carrying all that extra baggage around just to support discontiguous buffers. That's essentially what Pete said, too. > > There are several limitations to the current implementation. The two > > most notable are: It can't run asynchronously, and it doesn't directly > > map buffers from userspace. > > Async behavior wasn't a design goal. If usb-storage is > ready for it, hack away! :) In fact, a while back Pete and I worked on an async version of your s-g library code. It was intended for ub, which doesn't create its own threads. > Re userspace buffers, there are other messages on that; > IMO, that should be handled by a different set of calls > that pin the userspace buffers and morph them to kernel > ones appropriately. Yes, functionality that could be added to usbfs rather than the core or the HCDs. > > A new implementation will certainly have its own issues. In > > particular, I'm thinking of the requirement that every buffer in a > > transfer (except the last) must be a multiple of the maxpacket length. > > The block layer currently has no way to express this requirement. > > But has that been a real issue so far? Or its sibling > constraint on alignment? It has indeed been an issue for wireless USB. Bug reports have been posted by an early tester, showing examples where the block layer created scatterlists including 512-byte elements but the maxpacket length was 1024. The end result is that usb-storage is not usable over wireless. There was a proposal to fix this by implementing bounce buffers; Jens Axboe volunteered to write some code but never got around to it. Is alignment a constraint? As far as I know, host controller hardware doesn't care how the packet boundaries are aligned in memory. It just wants the data to be contiguous (or "virtually" contiguous, as the EHCI spec says). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html