Re: Complete TCP offload (or it's likeness)...

erich@uruk.org · Thu, 17 Oct 2002 10:03:16 -0700

Jeff Garzik <jgarzik@pobox.com> wrote:

> Unfortunately I don't see anyone ever being interested in even working 
> with companies on this.  TCP offloading limits users in a large number 
> of ways.
> 
> * Any time a TCP security issue arises, it cannot be fixed.  The offload 
>   logic is either downloaded from firmware or direct-coded into 
> hardware, neither of which is fixable by the Linux vendor, not 
> analyze-able by security experts.
> * A large number of Linux kernel net stack features are not present. 
> iptables is a big one, but there are many smaller missing features as 
> well.  You seem to recognize this with your "murky" references.
> * There are hardware limits which are not present in software.  One 
> common scenario that stumps offload-all-TCP vendors is slow connections: 
>   many Web sites are saddled with _thousands_ of users simultaneously 
> connected via slow links (modem users).  This scenario and similar Real 
> World(tm) scenarios like it hit socket or RAM limits on offload-all-TCP 
> cards very quickly.
> * At 1Gb/10Gb speeds, you must overcome problems like PCI bus 
> throughput.  This problem exists completely independent of where the TCP 
> stack is.
> 
> To sum, it's a dumb idea :)

Hmm.  There are clearly issues and limitations, hence my "murky" comment,
but I think they're interface problems more than anything else.  The
security and possible net stack features issues come from the fact
that there is currently no clean way to separate them from the rest of
the processing and assumptions of the code.

Hardware limitations, certainly the comment about PCI bus speeds, is
generally a problem of any hardware, hardly unique to a "full" TCP offload
NIC.  Obviously, it's the bet of any engineering organization that one
knows how to implement well the choice of features/capabilities of
the product at hand.

> Now, to be more productive, there are several things vendors can offload 
> onto cards for acceleration:
> * Rx/Tx checksumming
> * TCP segmentation offloading, for Tx
> * SMP-friendly packet buffering and reassembly, for Rx
> * other stuff
> 
> "offload everything" is just the easy thing marketing departments come 
> up with.  You need real engineers with intimate knowledge of TCP to come 
> up with good solutions for offloading specific portions of the TCP and 
> UDP work into hardware, while still retaining the flexibility offered by 
> the features present in the Linux TCP stack.

I agree that it is preferrable to implement things in cooperation with
the Linux TCP stack, but the current set of "acceleration" features is kinda
slim for the performance targets we'd need, certainly for 10Gbit, much
less for 4 1Gbit ports.

My current rough idea of a counter-proposal to make instead of "full"
TCP offload is:

  --  A mechanism (not sure what yet) for storing the TCP window on the
      card, so you essentially just copy it there from user space.

  --  A mechanism for auto-sizing the TCP window.  Very fast pipes need
      big TCP windows, but in the presence of lots of long-haul
      connections you easily eat huge amounts of memory.  I.e. static
      allocation sucks.  There seems to be a way, using the TCP slow-start
      and congestion algorithms, to do it dynamically.

  --  Always using something like TCP segmentation on send.

  --  A way to essentially join multiple packets on reception into
      large packets (get rid of the whole necessity for Jumbo packets,
      but get the benefits).  Kind of a "TCP unsegmentation" feature.

  --  A similar way to store the reception buffers/windows on the card
      until TCP "unsegmentation" happens into user space or final
      buffers of whatever sort.

All these seem to be the majority of what the Linux stack spends
it's time on, but I still need to measure the stack more carefully
with a simulation harness in place to determine if those would be
enough.

--
    Erich Stefan Boleyn     <erich@uruk.org>     http://www.uruk.org/
"Reality is truly stranger than fiction; Probably why fiction is so popular"
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html