Re: USB 3.0 in Linux main stream kernel

David Brownell <david-b@xxxxxxxxxxx> · Tue, 27 Jan 2009 16:56:39 -0800

On Tuesday 27 January 2009, Sarah Sharp wrote:
> 
> > I like the current model, whereby URBs deal with only a single
> > contiguous DMA buffer.  (Possibly one that's made contiguous
> > through an IOMMU coalescing pages.)  Having a uniform model is
> > a big win ... even with the exception whereby ISO transfers
> > split that buffer into discrete chunks.  So I'd rather keep to
> > the model whereby scatterlists are mapped to URBs outside the
> > sight of HCDs.
> 
> The problem is that I saw significant performance improvement with USB
> 3.0 prototypes when I pushed the scatter gather list down to the xHCI
> HCD.  The xHCI data structures are just set up in such a way that
> queuing a list of scatter gather entries is just natural.

That's a discussion we can have more producively when
everyone can see what those xHCI data structures are.  ;)

Are they really that different from EHCI or OHCI?  They
support queues too.  The generic model is "queue" ... not
scatterlist, which isn't used much outside the block layer.

> The performance increase might have been due to how the device was set up
> to do PCI DMA; it might have been due to something else.  I can't know
> until I run both sets of patches (bulk TX with and without scatter
> gather list push down) on multiple host controllers and multiple USB 3.0
> devices.
> 
> Inaky was saying that he would love to see scatter gather lists pushed
> down to the HCDs for wireless USB.  The USB core forces the scatter
> gather list from a driver into one buffer,

No, that's the DMA mapping which *MIGHT* do that, on
platforms with an IOMMU.  Typically each scatterlist
entry will be a page or two.  An IOMMU can turn a
dozen such entries into something that's virtually
contiguous in DMA-space.

There will still be N buffers in a scatterlist of
length N ... but the IOMMU might let it be treated
more efficiently.  (As I recall, Intel doesn't do
much with IOMMUs, except maybe on server hardware.)

There are three levels of optimization in the current
scatterlist code:

 - If an IOMMU is available, dma_map_sg() uses
   it to make the scatterlist shorter.  (Which
   means fewer DMA transfer descriptors, for
   hardware where that's relevant.)

 - Each remaining scatterlist entry is submitted
   asynchronously, so that the HCD receives a
   queue of transfers to stick in its DMA queue.
   (On hardware that queues DMA transfers.)

 - Rather than requiring an IRQ after each
   scatterlist entry completes, HCDs are told
   they only need to interrupt on the last one.

So for example I've seen individual scatterlists
of nearly a megabyte get sent to EHCI, which works
on them and then issues a single completion IRQ.

> then the wHCI has to break 
> that buffer apart again and insert more headers in between.

That would be a wHCI design issue, I'd think.

If it doesn't insert headers automatically, then
it's going to have lots of { header, data-fragment }
tuples ... which could be designed as fast, or not.

If there are DMA transfer descriptors, the worst
case is needing separate descriptors for header and
then data fragment.  Network stacks traditionally
avoid that by preallocating space in SKBs for lower
levels to add headers ... but USB takes arbitrary
buffers; so no SKBs, no header pre-allocation.

> If the 
> upper layer could just submit a scatter gather list down to the HCD and
> not have the USB core combine it, that would save a lot of copies.

Wanting to do even *ONE* copy is a bad model, and
will slow down your I/O performance significantly.

The USB stack is set up to facilitate zerocopy I/O,
at least so far as the buffers provided to usbcore
and HCDs from device drivers.  If they're smart,
they won't copy data either ... that gets tricky
if the data is coming direct from userspace, but
it's doable.  Or, worst/lazy case, a single copy.

- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html