Jeff Garzik <jgarzik@pobox.com> wrote: > Unfortunately I don't see anyone ever being interested in even working > with companies on this. TCP offloading limits users in a large number > of ways. > > * Any time a TCP security issue arises, it cannot be fixed. The offload > logic is either downloaded from firmware or direct-coded into > hardware, neither of which is fixable by the Linux vendor, not > analyze-able by security experts. > * A large number of Linux kernel net stack features are not present. > iptables is a big one, but there are many smaller missing features as > well. You seem to recognize this with your "murky" references. > * There are hardware limits which are not present in software. One > common scenario that stumps offload-all-TCP vendors is slow connections: > many Web sites are saddled with _thousands_ of users simultaneously > connected via slow links (modem users). This scenario and similar Real > World(tm) scenarios like it hit socket or RAM limits on offload-all-TCP > cards very quickly. > * At 1Gb/10Gb speeds, you must overcome problems like PCI bus > throughput. This problem exists completely independent of where the TCP > stack is. > > To sum, it's a dumb idea :) Hmm. There are clearly issues and limitations, hence my "murky" comment, but I think they're interface problems more than anything else. The security and possible net stack features issues come from the fact that there is currently no clean way to separate them from the rest of the processing and assumptions of the code. Hardware limitations, certainly the comment about PCI bus speeds, is generally a problem of any hardware, hardly unique to a "full" TCP offload NIC. Obviously, it's the bet of any engineering organization that one knows how to implement well the choice of features/capabilities of the product at hand. > Now, to be more productive, there are several things vendors can offload > onto cards for acceleration: > * Rx/Tx checksumming > * TCP segmentation offloading, for Tx > * SMP-friendly packet buffering and reassembly, for Rx > * other stuff > > "offload everything" is just the easy thing marketing departments come > up with. You need real engineers with intimate knowledge of TCP to come > up with good solutions for offloading specific portions of the TCP and > UDP work into hardware, while still retaining the flexibility offered by > the features present in the Linux TCP stack. I agree that it is preferrable to implement things in cooperation with the Linux TCP stack, but the current set of "acceleration" features is kinda slim for the performance targets we'd need, certainly for 10Gbit, much less for 4 1Gbit ports. My current rough idea of a counter-proposal to make instead of "full" TCP offload is: -- A mechanism (not sure what yet) for storing the TCP window on the card, so you essentially just copy it there from user space. -- A mechanism for auto-sizing the TCP window. Very fast pipes need big TCP windows, but in the presence of lots of long-haul connections you easily eat huge amounts of memory. I.e. static allocation sucks. There seems to be a way, using the TCP slow-start and congestion algorithms, to do it dynamically. -- Always using something like TCP segmentation on send. -- A way to essentially join multiple packets on reception into large packets (get rid of the whole necessity for Jumbo packets, but get the benefits). Kind of a "TCP unsegmentation" feature. -- A similar way to store the reception buffers/windows on the card until TCP "unsegmentation" happens into user space or final buffers of whatever sort. All these seem to be the majority of what the Linux stack spends it's time on, but I still need to measure the stack more carefully with a simulation harness in place to determine if those would be enough. -- Erich Stefan Boleyn <erich@uruk.org> http://www.uruk.org/ "Reality is truly stranger than fiction; Probably why fiction is so popular" - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html