Re: [RFC] CDC NCM USB host driver

David Brownell <david-b@xxxxxxxxxxx> · Sun, 20 Jun 2010 22:46:09 -0700 (PDT)

--- On Sun, 6/20/10, Oliver Neukum <oliver@xxxxxxxxxx> wrote:

> wrote:
> > 
> > > > > Am Donnerstag, 17. Juni 2010 05:41:24
> schrieb
> > > David Brownell:
>  
> > Not recently, except in the loose sense that
> > if there's going to  be such batching code
> > for NCM, it ought to be reusable.
> 
> OK, so we need to discuss
> 
> 1) do we want batching

To support NCM, EEM, and RNDIS we do.

but recall my point was that this is just
a way to work around USB stacks that have
weak support for transfer queues ...

So we're stuck with "wanting" to work around
USB stacks that don't work very well.What
I'd call "stuck with" not "want".

> 2) if so, how do we do it
> 3) how do we make it reusable
> 
> Right?

Exactly.

> > 
> > Right NOW for TX, isn't that the only solution?
> 
> Pretty much. But we may be able to innovate.
>  
> > ... Unless you do like the RNDIS code and stick
> > to one network packet per batch, which lets you
> > use much smaller buffers (TX only). and
> > thus avoid a lot of data copies for TX.
> > 
> > And for RX ... isn't the solution the converse,
> > but sharing the same packet buffer between all
> > the single-packet SKBs extracted from that
> > huge URB transfer buffer?
> 
> I think so. But it seems to me that for RX the situation
> is
> worse.

Given batching, it's much worse because the
buffer sizes are huge ("jumbograms") and
all those wierd alignment restrictions exist.
(Notice how normal network packets don't have
such restrictions.  That's a cue that strange
stuff is going on ...

 For TX we might use scatter/gather. For RX that
> is not possible, as we cannot predict where the datagramms
> will start or stop.
> 
> > > The problem here, as David pointed out, is that
> we must
> > > copy
> > > each datagramm.
> > 
> > Copy on TX.  You'll observe that for example
> > the RNDIS RX code shares the underlying packet
> > buffer (big) between the various packets which
> > get extracted; ... to avoid copying, but at a
> > cost in terms of memory fragmentation...
> 
> Do you have an alternative?

Don't bundle packets ... just use queues
effectively to avoid wasted bandwidth
between USB transfers, and to have the
network packets go directly into and out
of their buffers. :)

> > There's a pragmatic issue with that:  allocating
> > big SKBs fragments the relevant memory pools, and
> > isn't even guaranteed to work.
> > 
> > THat's another reason to prefer solutions that
> > stick to queuing single network packets in the
> > USB transfer queues.
> 
> Why? I can't see the reason for TX. We could
> always fall back to smaller buffers, couldn't we?

Both TX and RX benefit from smaller buffers that
are consistently sized, and never require memcpy().

> >  Thus I thought about what we can do to
> > > avoid
> > > a copy. We can avoid a copy if and only if we can
> fit the
> > > NTH,
> > > NDH and padding for alignment into the buffer
> associated
> > > with
> > > the skb we are given.

And that extra data isn't actually needed;
consider that CDC Ethernet works just fine
without them, and approaches (closely!) the
peak USB bandwidth without memcpy() costs.

> > 
> > That can be assured given MTU tricks; upper
> > layers of the network stack can be made to
> > pre-allocate that memory, at least for TX paths.
> 
> How does that work?

alloc_skb() or whatever, just
returns bigger buffers.

> > for RX it's less certain, unless the other end
> > can be made to adopt the same "only one packet
> > per bundle" policy.
> 
> Probably it can. The question is how hard that would
> affect performance.

That's implementation-specific.  On Linux the
effect is hardly observable.  On MS-Windows I
suspect it'd hurt ... they wouldn't go through
so much work to infect protocols with costly
mechanisms if their implementations didn't need
them  badly... right?

>  
> > The trivial case is "one packet per bundle". 
> It's
> > only larger bundles that get complicated and slow.
> 
> Do they? I mean they obviously complicatre stuff on
> the host side, but is that outweighed by gains on the
> device
> side?

When I looked at it ... no, not outweighed in
most cases.  peripherals don't have DMA chaining
in most cases, but they do have DMA, and sane
implementation strategies (not a given!!) can use
that to good effect without a need to bundle.

- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html