RE: "Bug" in howto 4.2.1 Split access and other advice

"Laurens van Alphen" <laurens.van.alphen@xxxxxxxxxxxxxx> · Fri, 5 Jul 2002 20:25:49 +0200

Hi,

What is still unclear to me is when the
http://www.linuxvirtualserver.org/~julian/#routes patches are needed.
What do they do exactly?

Thanks in advance,

--
Laurens van Alphen

-----Original Message-----
From: Ard van Breemen [mailto:ard@telegraafnet.nl] 
Sent: vrijdag 5 juli 2002 19:32
To: lartc@mailman.ds9a.nl
Cc: HOWTO@ds9a.nl
Subject:  "Bug" in howto 4.2.1 Split access and other advice

Hi,
http://lartc.org/HOWTO//cvs/2.4routing/html/lartc.rpdb.multiple-links.ht
ml
I am not sure who wrote this part or what it was based upon, but since I
am working a lot longer now with ip rules, I think I want to add some
stuff: The example 4.2.1 refers to the picture above, and does a plain
ip rule add from .... table .... The problem with the exampe is that if
you connect from the inside (local network) to your if1 ip or if2 ip,
that in this example the replies to the local-network are going out if1
or if2... That is not what you want.

If we carefully study the ip rule set, we see the first number. This
number is the priority, and if you use this priority number in your rule
command, it will insert it in the rule list after the 
rule with the same or lower priority.

I am using this to differentiate between known routing, and default
routing.

So first we set up the link local routing (the things you can reach
directly). Actually you don't have to do a thing for that, except
setting up the interface. List your local routing to understand: ip
route show table main

Then we put the default routing into table default.
And between the main and the default rule we put rules that
differentiate the default routing per provider.

example: ############################################################
# set up table main by upping the interfaces
ifup eth0   # local net
ifup eth1   # provider 1
ifup eth2   # provider 2

# Now set up the default routes in a failover fashion, with the # most
important route first(f.i. gw-provider1):
ip route add default via gw-provider1    table default
# Secondary route when the first gw fails:
ip route append default via gw-provider2 table default
# (The append is needed because else the routes will clash)
# So we have a table with two default routes wich will failover # for
eachother (takes about 10 minutes in default config)

# We now have a simple failover, which for most of us will not # work,
since most providers will have src-ip filtering. # We are going to fix
that now: ip route add default via gw-provider1 table rt-provider1 ip
route add default via gw-provider2 table rt-provider2 # We now have 2
tables each with a single different default gw. # They are not used,
that is what we are going to solve now:

ip rule add from ip-eth1 table rt-provider1 prio 32766
ip rule add from ip-eth2 table rt-provider2 prio 32766

# That's it.
############################################################

So let's think about it, and look at it:
ard@erwin(slave):~$ /sbin/ip rule list
0:      from all lookup local 
32766:  from all lookup main 
32766:  from <someip> lookup <sometable>
32767:  from all lookup default 

This is not what we did above, but it is a rule list from a working
environment (does www.telegraaf.nl ring a bell?).

What we should have seen was this:
/sbin/ip rule list
0:      from all lookup local
32766:  from all lookup main
32766:  from <ip-eth1> lookup rt-provider1
32766:  from <ip-eth2> lookup rt-provider2
32767:  from all lookup default

The main difference with the example in the document is:
- We do *not* have a default route in main
- We *have* default routes in the default table
- We have rules *after* main, not before main

So what is the catch:
The only catch is that if you do not have point-to-point connections
with your provider, but a /24 for example, then requests coming in from
provider2 for the ip-eth1, will go out from your eth2 and not from your
eth1. This might be a problem if your /24 is filtered by your ISP. The
solution to that is the essence of this story: move the calling of your
default route tables from the rules to the last possible moment. So to
fix the catch you get two more routing tables: (With a provider3 added
for clarity)
############################################################
ip route add net-provider2/24 via gw-provider1 table use-gw-provider1 ip
route add net-provider3/24 via gw-provider1 table use-gw-provider1 ip
rule add from ip-eth1 table use-gw-provider1 prio 32765

ip route add net-provider1/24 via gw-provider2 table use-gw-provider2 ip
route add net-provider3/24 via gw-provider2 table use-gw-provider2 ip
rule add from ip-eth2 table use-gw-provider2 prio 32765

ip route add net-provider1/24 via gw-provider3 table use-gw-provider3 ip
route add net-provider2/24 via gw-provider3 table use-gw-provider3 ip
rule add from ip-eth3 table use-gw-provider3 prio 32765
############################################################

I hope this makes some sense, and I hope it also is clear that this is
only needed for the link-local network of your provider only if it is
filtered!

Next thing: I was talking about failover earlier:
If a gateway is not available (ie, it does not reply to arps), linux
will think it is dead within a few minutes, and use the other gateway.
But only if it reaches the default table. It reaches that, when it does
not have a clue of the outgoing src-ip yet. So if an application makes a
connection to a website, and the first gateway is considered dead, it
will connect to the website using the second gateway, and therefore bind
to ip-eth2.

Last thing: this failover thingie can also be a "loadbalanced" thingie
as explained in "4.2.2 Load balancing".
However: due to bugs in the equalizeing code, I recommend against it.
Somewhere inside the kernel it cannot clearly come up with a route,
which results in a lot of "cannot happen 777". Next to that: the usage
counts of the devices are not correctly incremented and decremented. You
have to be very careful and craft an extra non-multipath route before,
then remove the existing multi-path route, before bringing down a
network device. Else it ends up in an endless "device still in use,
waiting". And you will not be able to use the device anymore until you
reset some sense into the machine...
-- 
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

RE: "Bug" in howto 4.2.1 Split access and other advice

Linux Advanced Routing and Traffic Control