I am a neutral person that will try to give some input in the matter. I will try herein to be objective, i don't want to come into a battle of what is to be the trend (either way, i already have an opinion of my own). So for the sake of enriching the debate... > Unfortunately I don't see anyone ever being interested in even working > with companies on this. One of the biggest high tech manufacturers defined to me the "TCP end-node bottleneck" as a $1B business. That itself may be worth to spend some time doing research. In fact, the technology already has a name, TOE. Companies working on it? Many, many. It comes from a fact: the core of the Internet is going to optics. Where is the edge going? nowhere for now, unless we have a major shift in technology, the edge has to interface with humans and that requires electronics, to parse contents of a packet (lights don't work here). Probably this will not change for a while, at least until quantum computers come to our desks. Therefore, the ratio throughput/clockrate goes up, very much. Before continuing, just wanted to make a comment. The measure here is bits per cycle. It is computed as [throughput] over [system clock rate]. This measure was introduced about 10 years ago. Since then, throughput and cpu cycles have been moving forward separately. But now both together are becoming really important due to the new electronic bottlenecks. Research says that with the current computer architecture, one can achieve 1 bit per cycle. This means that to handle 10 Gbps ethernet, you will need 10 GHz CPU. This does not scale too well, as ethernet moves to optics. The whole idea here is that unless we change the model, we will not be avaible to achieve better bits per cycle. It is believed that TOE's can achieve in the order of 10s of bits per cycle. > * At 1Gb/10Gb speeds, you must overcome problems like PCI bus > throughput. This problem exists completely independent of where the TCP > stack is. Actually that is one of the strong arguments of the offload approach. With a tcp running in the kernel, there is a pci bus access in a per packet basis. In an offload tcp architecture, there is a pci bus access in a per buffer of data basis. Many applications write to sockets in chunks that are about 10 times as big as the MTU, so 10 is at least the number of times that you save when accessing pci bus. I say at least, because it is much more, for instance ack packets or any non-data packet do not even go through the pci bus with an offloaded tcp architecture. >* Any time a TCP security issue arises, it cannot be fixed. The offload > logic is either downloaded from firmware or direct-coded into > hardware, neither of which is fixable by the Linux vendor, not > analyze-able by security experts. What is the difference between a piece of code running on a board and a kernel running in a cpu which is only one pci bus away from the board? upon a new security issue, one can always change the software, regardless of where the sw sits, that is the beauty of the software, so let's use it to come up with better things. The trick now is to implement in HW those little parts that do not jeopardize the security of the system. > * A large number of Linux kernel net stack features are not present. > iptables is a big one, but there are many smaller missing features as > well. You seem to recognize this with your "murky" references. That is the reason why some people's approach is the full offload, so that you still have all functionalities. IP tables should be in. > * There are hardware limits which are not present in software. One > common scenario that stumps offload-all-TCP vendors is slow connections: > many Web sites are saddled with _thousands_ of users simultaneously > connected via slow links (modem users). This scenario and similar Real > World(tm) scenarios like it hit socket or RAM limits on offload-all-TCP > cards very quickly. Let's see, how much is the size of a socket? i forgot, let me run the code, be right back... in freebsd the size is 208 bytes. Now, how many connections are we looking at? let's say 10,000 (it is in fact less). This is about 2MB of RAM, not an issue for today's technology. One may also add the retx queue if we are dealing with TCP. Let's say worse case the retx queue is full, 32 KB (default). Then you need 320 MB, not a problem either, we are talking about 10,000 connections, a server of such dimensions should have at least a few GB of RAM. Why not puting just 500MB on board? > Unfortunately I don't see anyone ever being interested in even working > with companies on this. TCP offloading limits users in a large number > of ways. In fact, there are many companies and researchers interested, networking people is trying to solve this issue together with the storage people. But i think you know about this, so sorry, i am not sure about the above comments. > To sum, it's a dumb idea :) Not sure about this, from a technical standpoint. > "offload everything" is just the easy thing marketing departments come > up with. You need real engineers with intimate knowledge of TCP to come > up with good solutions for offloading specific portions of the TCP and > UDP work into hardware, while still retaining the flexibility offered by > the features present in the Linux TCP stack. Some people that are very intimate with tcp and that are engineers, not marketing people (by that i mean people driven by engineering passions, not market ones), have arrived to the conclusion that puting tcp living in a generic purpose environment is not the best place. As the ietf moves forward with the vision of the sand clock layering (the layers in the middle, ie. tcp/ip, become more static and they are candidates to be moved to silicon), offloading tcp may be a solution for the edge-node problem. Just like math coprocessors have been designed in the past, or acceleration graphic cards were created closer to the main cpu, you may think now of the concept of network coprocessor. The goal, to improve the number of bits per cycle. Why? to have a more cost effective system. As for the OS interoperability issues, it is a matter of designing a challenging api. Here is where the beauty of linux comes to the picture. In fact, some people believe that TOE's could be another oportunity for Linux to show its power. Implementing TOE's in a non-open source environment like windows is really a challenge. These are the technologies that can actually prove the value of the open source, they may be really challenging technologies, but that's the spirit of the open source community. Feedbacks are welcome. Jordi - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html