> someone post some reproducable > kick-ass specweb99 results that are due to TOE and then we can > have a serious discussion. I agree with you, let's wait a few months. If we don't see any true TOE yet is because people is being working seriously in a few garages. Technologies don't show up out there overnight. >> Actually that is one of the strong arguments of the offload approach. With a >> tcp running in the kernel, there is a pci bus access in a per packet basis. >> In an offload tcp architecture, there is a pci bus access in a per buffer of >> data basis. > What you might not understand is that with TCP segmentation offload, > which we support, you effectively get EXACTLY this. > Only one set of headers go out over the bus for a 64K chunk of data. > This is old hat, nothing new, and nothing that requires TOE. The burden is with the control packets. In the case of TCP, you still need 1 ack for every two MTU units of data of that 64 KB chunk. You could try to buffer them and coalescent them in the HW but then you are changing the dynamics of the system because you have to add timers in the HW for the case of packet lost. That would change your RTT computations screwing the flow control. Again, you can get aroung probably with that too... but the picture is bigger than what we have been talking so far. Let me give some insight with this bigger picture using one of the statements previously posted: > Right now bus speeds and networking speeds limit networking processing > throughput and latency. And if Moore's law is correct, the cpu will > catch up when we hit 10gbit for the _EXTREMELY LIMITED_ amount of > processing that is needed in the stack right. (...) > Once you've offloaded the checksumming and the segmentation, as we do > right now, there simply isn't much else to do except basic socket > management and process wakeup. In fact Moore's law shows also that electronics cannot catch up with optics. In the core of the Internet, you will have routers with MEMs capable to switch packets without even going to the electronic domain. That can be done because routing can be achieved by just looking at the color of the wavelength, it will an all optical network. That shift itself enlarges the communication pipes by 3 orders of magnitude. So the network is going optics, yet the server stays electronics. We are not talking about small percentages in performance improvement, rather we are talking about the need of a communication system which is orders of magnitud faster. With TCP segmentation and checksum offloading you can send 64 KB (if you are lucky with the congestion window) at about 2000 cycles per chunk. We are talking about building those 64 KB of data with 10s or 100s of cycles instead. Then you can have a network card that can process your stack with a system clock rate one or two orders of magnitud less than the host cpu. These are just few technical arguments that have been proved in the labs (soon will be disclosed in technical papers and scientist conferences), but i would also like to talk about the protocol meaning of offloading tcp which is to me the real need: The reason to terminate TCP connections in a network processor is not only to speed up TCP (which is all we have been talking about so far) but to have an scalable protocol architecture. Let me give some insight here, which is something that is publicly well known but yet has not been properly communicated. TCP termination is needed if you want to further offload other things. Example? iSCSI. And please trust me, there is a lot of people from the storage side that are asking networking people to provide that solution. Otherwise, how do you want to serve 10 Gbps iscsi with all the data running through the pci bus and having multiple copies? let's think about it for a sec, it is bulk data, why should we be going through the general purpose kernel if that data is never touched? with the capability to terminate tcp connections in the hw, you will be able to handle tera bytes of data (that is what an optical network will transport in the future) without even going though the pci bus (zero pci access). We are talking about solutions that will be needed long term, tcp segmentation and checksum can only solve current issues, but will not scale. > I welcome these companies to spend the money to look into this, > and when facts show that their solution can go head to head with > what's currently out there, then I'll be convinced. > Nobody, and mean not one, of these TOE folks have approached me and > said "and we'll GPL our TOE firmware etc. of course". All of them > want to do binary-only firmware. I hope that we can all here understand the needs of future communications. There are already garage solutions that prove to work and improve the bits per cycle 10 times. Within a year, you will see products that can deliver from up to 100 bits per cycle, much more than today's 1 bit per cycle. But we do need the help of the community for an optimal product (interoperability). If we don't get the help though, people will do it anyways. Because of the community need and because of our engineering passions. The nice thing is that open source allows for people to scratch their heads to come up with better solutions. In fact, i have worked with the Linux kernel and i myself would like to see Linux as a leader in future communication architectures, since open source is the way to go to achieve optimal solutions. My team of engineers and I are open to discuss further on how we can all together make a better world for this particular bottleneck. But we have all to be convinced first on the vision of the world that we would like to craft. We have studied this very seriously for a long time, joining together the best talents in TCP/IP and storage side, because we really believe something more is needed. We have been working in the labs and now time is coming for us to share our results with the community. Jordi - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html