Re: [RFC] CDC NCM USB host driver

David Brownell <david-b@xxxxxxxxxxx> · Fri, 18 Jun 2010 13:34:28 -0700 (PDT)

--- On Fri, 6/18/10, Oliver Neukum <oliver@xxxxxxxxxx> wrote:

> > > Am Donnerstag, 17. Juni 2010 05:41:24 schrieb
> David Brownell:
> > > > Oliver, I sill don't understand what you're
> > > > trying to say,or how it relates to the
> > > > structural point I was making:  that
> the
> > > > batching isn'treally needed  (or
> helpful)
> > > > given sane USB  DMA/transfer queues
> > > >  (as on Linux).
> > > 
> We were talking about an implementation of batching network
> packets for transfer used by NCM and possibly other
> drivers.

Not recently, except in the loose sense that
if there's going to  be such batching code
for NCM, it ought to be reusable.

The messages to which you responded were on
a different topic:  namely, that batching
was just a workaround for poor transfer queue support ... and given sane transfer queueing
support, implementations could just as easily
use that (while avoiding the need to memcpy
every data packet).

If you wanted to change the topic, it would
really have helped to change $SUBJECT.  For
example "how to implement NCM style batching".

 Thus
> the requirements of NCM at least have to be met.
> 
> Looking at chapter 3.1 of the NCM specification, it seems
> to me that
> 
> a) the host must transfer all data associated with an NTH
> without
> short packets or ZLPs until the end
> b) after each short packet or ZLP a new NTH must be sent
> 
> In addition if you look at chapter 3.3.4 of the NCM
> specification
> it is clear that the host must meet fairly arbitrary
> alignment requirements,
> which the device specifies at runtime.
> 
> Now we are within the spec if we send out our data with an
> NTH,
> and NDH and a properly aligned datagramm.

I'd have to get a new copy of the NCM spec and
read it again, in order to comment
at that level of detail...

However, I think of the issue in a slightly
different way:  namely, that what the batching
requires drivers to construct USB transfers

(1.N full size packets then a short one to
terminate) out of multiple SKBs (and for the
sake of argument, I'll assume the various NCM
headers are packaged in SKBs too ... possibly
discrete, possibly prepended to other SKBs.

We don't currently have a way to describe USB
transfers except as the single buffer associated
with an URB.  Specifically, we can't build one
transfer out of two or more URBs.

> It seems to me that we can trivially meet the requirements
> of the
> NCM specification by allocating a large buffer and copying
> the datagramms
> (most likely ethernet frames) with the proper alignment
> into
> the buffer and transfer it by means of one URB.
> 

Right NOW for TX, isn't that the only solution?

... Unless you do like the RNDIS code and stick
to one network packet per batch, which lets you
use much smaller buffers (TX only). and
thus avoid a lot of data copies for TX.

And for RX ... isn't the solution the converse,
but sharing the same packet buffer between all
the single-packet SKBs extracted from that
huge URB transfer buffer?

> The problem here, as David pointed out, is that we must
> copy
> each datagramm.

Copy on TX.  You'll observe that for example
the RNDIS RX code shares the underlying packet
buffer (big) between the various packets which
get extracted; ... to avoid copying, but at a
cost in terms of memory fragmentation...

There's a pragmatic issue with that:  allocating
big SKBs fragments the relevant memory pools, and
isn't even guaranteed to work.

THat's another reason to prefer solutions that
stick to queuing single network packets in the
USB transfer queues.

 Thus I thought about what we can do to
> avoid
> a copy. We can avoid a copy if and only if we can fit the
> NTH,
> NDH and padding for alignment into the buffer associated
> with
> the skb we are given.

That can be assured given MTU tricks; upper
layers of the network stack can be made to
pre-allocate that memory, at least for TX paths.
for RX it's less certain, unless the other end
can be made to adopt the same "only one packet
per bundle" policy.

.
> with multiple
> URBs the way we do it for storage.

Storage carefully keeps each URB's buffer
distinct, and doesn't try to map/combine
them into a single ginormous transfer,
holding the equivalent of multiple Ethernet
packets.

 That however, as we may not send
> short packets is possible only if we give each URB a buffer
> that is
> a multiple of the device's maximum packet size. And I
> wondered
> whether we can meet this requirement without a copy.
> 
> What do you think?

I think you need to consider the RX side too,
and see what parameter negotiations each of the
protocols supports:  packets-per-bundle, etc.

The trivial case is "one packet per bundle".  It's
only larger bundles that get complicated and slow.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html