Grant Taylor wrote:
On 10/05/07 05:05, John Default wrote:
I was told that layer 3 switches are faster because "routing" there
is done by some ASIC hardware. Is there any advantage in having
another routing code in bridging when everything is done in software
which is same slow as normal routing? The only speed gain would be in
keeping the routing code very simple with limited functionality, but
i think that the trend is to put there more and more functionality
which would end up in having two same slow, same function code in two
places.
Ah, there in lies the difference in what you are saying, which as a
norm is probably correct and something that I do not disagree with. I
guess I should say that my introduction to L3 switching is actually on
Cisco Catalyst 5000 / 5500 L2 switches where they depend on an
external Cisco L3 router to assist in the L3 switching. Rater that is
to say that the L2 switch and the L3 router communicate with each
other to combined do L3 switching. As I understand it, the L2 switch
will send initial packets to the L3 router along with some meta data.
The L3 router will route the packets and send them back to the L2
switch with updated meta data. Then the L2 switch will have learned
with the help of the L3 router that the packets can be altered on L2
to emulate L3 routing but this time in hardware. Thus the L2 switch
depends on the L3 router to do the initial routing and then the L2
switch will subsequently step up and L2 switch across L3 boundaries
based on what it learned from the L3 router.
So, I guess I should say that I'm not wanting to (re)implement the
routing code in the kernel, it does quite fine for me thank you very
much. ;) I'm looking for a way to alter source / destination MAC
addresses of packets on L2 to emulate what happens in routing. I
believe that I could SNAT / DNAT the MAC addresses of the packet via
EBTables on L2 to achieve the effect of an L3 route. I would do this
by having the bridging code in the kernel learn from cached (?)
results of a previous L3 route.
In other words if the packet is in a NEW connection state, send it on
up to L3 routing. If the packet is in an ESTABLISHED state and we can
pull information form the systems ARP cache to know the destination
MAC address for the next subnet as well as pull the correct source MAC
address for the interface on the next subnet, then we could just SNAT
/ DNAT the MAC addresses on L2 and send the packet back out on the
appropriate wire.
I'm wondering if this NATing of the source and destination MAC
addresses on L2 would be faster than passing the packet up to L3
routing. It is my belief that L3 will do more sanity checks on
packets than L2 will. These sanity checks will take time to perform
which could be avoided if we could just NAT the source and destination
MAC addresses on L2. Or at least that's what I think. I could be
very wrong about it.
(i was taugth that packets are routed on L3, frames are
switched(bridged) on L2. And L3 switch does L2 switching + L3 routing
but in hardware. routers are completely a software thing, switches
hardware thing, and bridge is switch in software.)
I can agree with that statement. However I'll spin what you said a
little bit and then I think you can see how I'm logically progressing
on down the line.
Switching is a L2 operation, no matter what that operation is.
Routing is a L3 operation, no matter what that operation is. Thus if
we perform some sort of L3 type operation on L2 then we are performing
some sort of switching operation. If that operation happens to be
routing which is normally a L3 operation, then we are doing a L3 like
operation on L2, thus L3 switching. So now that I have circularly
argues that, how about an example.
Let's say that we have two end point hosts on separate subnets with an
intermediary router.
+---------+ +-------------------+ +---------+
IP: | 4.0.0.9 +-----+ 4.0.0.1 : 5.0.0.1 +-----+ 5.0.0.9 |
MAC: | ..00:0f | | ..11:1e : ..22:2d | | ..33:3c |
+---------+ +-------------------+ +---------+
If I want to send an ICMP ping from 4.0.0.9 to 5.0.0.9 the ethernet
frames will be sent from ..00:0f to ..11:1e and from ..22:2d to ..33:3c.
Note that the routing code on the intermediary router will see that
the packet needs to be routed from one subnet to the other and will do
so just fine with out any problems at all. However this is a layer 3
operation.
What I'm wanting to do is educate L2 enough so that it can use cached
results from L3 to perform a similar operation on L2 in the future.
Thus when the frame from 4.0.0.9 with a MAC address of ..00:0f comes
in destined to 5.0.0.9 with the router's MAC address of ..11:1e I'm
wanting to alter the frame coming in to the switch such that the new
destination MAC address will be ..33:3c with a new source MAC address
of ..22:2d based on contents of the system's ARP cache with a little
bit of help.
It is my belief that this L2 operation of SNATing and DNATing the MAC
addresses with out sending the data up to L3 will be faster than
sending the data up to L3 and doing its full processing. At least
that is what this entire discussion is based on. At the very least I
believe I'm going to do some controlled tests to see if this will even
work with manually entered static configurations.
If this does work, I think it would be possible to come up with a new
EBTables target that could alter the destination MAC address based on
the contents of the system's ARP cache (the system just spoke to the
target, thus the target MAC should be in the ARP cache, if not the ARP
code does a fine job at it's job and can get us the MAC address). The
only hiccup that I don't have an answer for at the moment is picking
the correct source MAC address. However looking at the contents of
the ARP cache we see that the interface is listed as well. So we
could do a simple translation from interface to source MAC address.
Thus I believe we have the basis of a rough crude logistical algorithm
to L3 switch (a n L3 operation on L2) traffic through a Linux system.
So, now i get it (after your first mail, it wasn't possible :)). I
think the idea is great, but.
What everything would you we actually avoid ? For correct operation we
will have to look at destination IP anyway, skipping only ip header
check (iphdr checksum, version, maybe length check), which consists of
functions that are implemented in very quick way (sum through 20B
written in assembly..) (probably few tens of nanoseconds on 1GHz processor)
With the probability of damaged packet header we probably can skip
checking. But there are some security problems that can arise from that.
Then we avoid lookup in routing table. But routing already does have
cache (i don't know how effective) for routes to avoid doing the lookup
for each packet. Will this be much faster than route cache ?
Bringing it down to lower, dumber layer we risk that we will somehow
mess up policy routing, multipath routing and probably some other
advanced things.
Another thing is that turning the l3 switching on, router will start to
behave little bit different as usually, what could confuse the
administrator ...
What about NAT and other packet-changing things in iptables (and QoS
marking and the like)? Stealing packet before layer3 processing we
avoid these things as well i think. Hm this could really become a problem.
There could be mechanism for detecting if packet is changed anyhow and
then we would not touch it, but if box is meant for changing packets,
then we would have to implement it too or process no packets at all
...(you are right, who would use l3 switch for NAT : ) )
... and you should probably decrement and check the ttl too : )
Please excuse me if i am missing your idea completely.
Please read and chew on what I've brain farted to the mailing list.
Poke holes in it and let's discuss this. If this truly will not work,
I have only wasted some bandwidth and bytes on drives, nothing else.
All the while we will have hopefully cleared a few cob webs from our
collective brains. ;) At least for a few minutes while I try to make
a fool of my self. :}
I just mentioned few things that came to my mind that might need to be
considered. But otherwise i think the idea is very nice. I will try to
find out more, just need to find time to read the source ; )
(disclaimer: I am just beginner, with my stupid questions i am just
trying to help your thinking process)
Grant. . . .
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
--
___________________________________
S pozdravom / Best regards
John Default
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc