RE: Complete TCP offload (or it's likeness)...

"Jordi Ros" <jros@ece.uci.edu> · Thu, 17 Oct 2002 23:08:35 -0700

I am a neutral person that will try to give some input in the matter. I will
try herein to be objective, i don't want to come into a battle of what is to
be the trend (either way, i already have an opinion of my own). So for the
sake of enriching the debate...

> Unfortunately I don't see anyone ever being interested in even working
> with companies on this.

One of the biggest high tech manufacturers defined to me the "TCP end-node
bottleneck" as a $1B business. That itself may be worth to spend some time
doing research. In fact, the technology already has a name, TOE. Companies
working on it? Many, many.

It comes from a fact: the core of the Internet is going to optics. Where is
the edge going? nowhere for now, unless we have a major shift in technology,
the edge has to interface with humans and that requires electronics, to
parse contents of a packet (lights don't work here). Probably this will not
change for a while, at least until quantum computers come to our desks.
Therefore, the ratio throughput/clockrate goes up, very much.

Before continuing, just wanted to make a comment. The measure here is bits
per cycle. It is computed as [throughput] over [system clock rate]. This
measure was introduced about 10 years ago. Since then, throughput and cpu
cycles have been moving forward separately. But now both together are
becoming really important due to the new electronic bottlenecks. Research
says that with the current computer architecture, one can achieve 1 bit per
cycle. This means that to handle 10 Gbps ethernet, you will need 10 GHz CPU.
This does not scale too well, as ethernet moves to optics. The whole idea
here is that unless we change the model, we will not be avaible to achieve
better bits per cycle. It is believed that TOE's can achieve in the order of
10s of bits per cycle.

> * At 1Gb/10Gb speeds, you must overcome problems like PCI bus
> throughput.  This problem exists completely independent of where the TCP
> stack is.

Actually that is one of the strong arguments of the offload approach. With a
tcp running in the kernel, there is a pci bus access in a per packet basis.
In an offload tcp architecture, there is a pci bus access in a per buffer of
data basis. Many applications write to sockets in chunks that are about 10
times as big as the MTU, so 10 is at least the number of times that you save
when accessing pci bus. I say at least, because it is much more, for
instance ack packets or any non-data packet do not even go through the pci
bus with an offloaded tcp architecture.

>* Any time a TCP security issue arises, it cannot be fixed.  The offload
>  logic is either downloaded from firmware or direct-coded into
> hardware, neither of which is fixable by the Linux vendor, not
> analyze-able by security experts.

What is the difference between a piece of code running on a board and a
kernel running in a cpu which is only one pci bus away from the board? upon
a new security issue, one can always change the software, regardless of
where the sw sits, that is the beauty of the software, so let's use it to
come up with better things. The trick now is to implement in HW those little
parts that do not jeopardize the security of the system.

> * A large number of Linux kernel net stack features are not present.
> iptables is a big one, but there are many smaller missing features as
> well.  You seem to recognize this with your "murky" references.

That is the reason why some people's approach is the full offload, so that
you still have all functionalities. IP tables should be in.

> * There are hardware limits which are not present in software.  One
> common scenario that stumps offload-all-TCP vendors is slow connections:
>   many Web sites are saddled with _thousands_ of users simultaneously
> connected via slow links (modem users).  This scenario and similar Real
> World(tm) scenarios like it hit socket or RAM limits on offload-all-TCP
> cards very quickly.

Let's see, how much is the size of a socket? i forgot, let me run the code,
be right back... in freebsd the size is 208 bytes. Now, how many connections
are we looking at? let's say 10,000 (it is in fact less). This is about 2MB
of RAM, not an issue for today's technology. One may also add the retx queue
if we are dealing with TCP. Let's say worse case the retx queue is full, 32
KB (default). Then you need 320 MB, not a problem either, we are talking
about 10,000 connections, a server of such dimensions should have at least a
few GB of RAM. Why not puting just 500MB on board?

> Unfortunately I don't see anyone ever being interested in even working
> with companies on this.  TCP offloading limits users in a large number
> of ways.

In fact, there are many companies and researchers interested, networking
people is trying to solve this issue together with the storage people. But i
think you know about this, so sorry, i am not sure about the above comments.

> To sum, it's a dumb idea :)

Not sure about this, from a technical standpoint.

> "offload everything" is just the easy thing marketing departments come
> up with.  You need real engineers with intimate knowledge of TCP to come
> up with good solutions for offloading specific portions of the TCP and
> UDP work into hardware, while still retaining the flexibility offered by
> the features present in the Linux TCP stack.

Some people that are very intimate with tcp and that are engineers, not
marketing people (by that i mean people driven by engineering passions, not
market ones), have arrived to the conclusion that puting tcp living in a
generic purpose environment is not the best place. As the ietf moves forward
with the vision of the sand clock layering (the layers in the middle, ie.
tcp/ip, become more static and they are candidates to be moved to silicon),
offloading tcp may be a solution for the edge-node problem. Just like math
coprocessors have been designed in the past, or acceleration graphic cards
were created closer to the main cpu, you may think now of the concept of
network coprocessor. The goal, to improve the number of bits per cycle. Why?
to have a more cost effective system.

As for the OS interoperability issues, it is a matter of designing a
challenging api. Here is where the beauty of linux comes to the picture. In
fact, some people believe that TOE's could be another oportunity for Linux
to show its power. Implementing TOE's in a non-open source environment like
windows is really a challenge. These are the technologies that can actually
prove the value of the open source, they may be really challenging
technologies, but that's the spirit of the open source community.

Feedbacks are welcome.

Jordi

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html