Re: dead router detection

Grant Taylor <gtaylor@xxxxxxxxxxxxxxxxx> · Tue, 06 Nov 2007 12:00:52 -0600

On 11/06/07 06:39, Guillermo Gómez (Gomix) wrote:
I would like to know what happens with a dead router in a multipath 
configuration like the one presented 
http://lartc.org/howto/lartc.rpdb.multiple-links.html

Do i need to monitor dead routers and reconfigure ?

Dead Gateway Detection (a.k.a. DGD) built in to stock Linux kernels will 
 detect the death of immediately connected gateways.  DGD will only 
work with gateways on the same subnet, not beyond other gateways.  DGD 
running on 'Client' below will detect the death of 'Router A' or 'Router 
B' but not 'Router C' nor 'Router D'.  For 'Client' to be aware of the 
death of 'Router C' or 'Router D' a routing protocol will need to be used.

                 +----------+       +----------+
             +---+ Router A +---+---+ Router C +---
+--------+   |   +----------+   |   +----------+
| Client +---+                  |
+--------+   |   +----------+   |   +----------+
             +---+ Router B +---+---+ Router D +---
                 +----------+       +----------+

DGD is used for the Linux kernel to detect when a given router is 
unreachable and to fail over to the next available route.  For this to 
work 'Client' would have to have the following two routes in place.

route add default gw <Router A> metric <N>
route add default gw <Router B> metric <N>

DGD will detect the failure of one gateway (route) and fall back to the 
next available gateway (route).

One point of interest is that DGD purportedly only works with default 
routes, not routes to specific destinations.  I have not personally used 
this so I can not say for sure.

I have tested the following scenario with stock Linux kernels and had 
success.

+-------------+                         +------------+
|  'A'   eth0 +---[Switch]---[Switch]---+ eth0   'B' |
| dummy0      |                         |      dummy0 |
|        eth1 +---[Switch]---[Switch]---+ eth1       |
+-------------+                         +------------+

I had two routes set up on each system that the network bound to the 
opposing systems dummy0 available via the opposing systems eth0 and eth1 
interfaces.  So each system had two routes to the opposing dummy0 network.

I ran pings from one systems dummy0 interface to the other systems 
dummy0 interface.  I then disconnected the ethernet cable from one of 
the systems eth interfaces.  With in 60 seconds the system that I did 
not disconnect the cable on would realize that the gateway was dead and 
drop back to the one remaining gateway.

If I plugged the ethernet cable back in and manually restored the config 
on the system that I unplugged the cable from (when the interface went 
down the kernel removed its configuration) the system would send traffic 
back to the other system using both interfaces.

So, say I unplugged the cable from eth0 on B, A would realize that the 
route that used B:eth0 as the gateway was dead and so A would stop using 
that route.  B would know immediately that replies needed to to back to 
A over eth1 because it already knew that it could not reach eth0 on A 
because its interface was down.

Once I plugged the cable back in to eth0 on B and re-configured the IP 
address and routes back to A (again the kernel removed the interface 
config and routes when it saw the physical link was dead) B immediately 
started using both routes again.  A allowed the traffic to come back in 
eth0 while still sending the traffic out eth1.  After about 45 - 60 
seconds of live traffic on eth0 the kernel on A decided that the gateway 
was back alive and started using the route again.

When I ran this test I was trying to make sure that A would work with 
out any regard to B.  B was under someone else's control and as such not 
my worry how it behaved.  I found that A would detect the lack of the 
ability to reach B via one route or the other and start using the 
remaining route(s) as it should.

I did not need to run any sort of monitoring of traffic on either eth0 
or eth1 on A because I was able to rely on incoming traffic from both 
routes to increment the kernel's packet counters that were used by the 
DGD algorithm.  However if I was implementing both sides of this 
situation I would have needed to periodically do something like an 
ARPing to both eth0 and eth1 on B to make sure that they were both 
alive.  More specifically, I would need to ARPing to see if the routes 
were resurrected.  The kernel would watch packet counters to see when a 
route died.  However when the route died, there would be no normal 
traffic to start incrementing the counters when the route came back to 
life.  Thus I would need to create the traffic via ARPing.

There is another issue that you need to be aware of with what "Routing 
for multiple uplinks/providers".  Namely when you use "Routing for 
multiple uplinks/providers" you have multiple external IP addresses that 
systems see you coming from.  When you are coming from multiple external 
IP addresses, you can not shift traffic from one route to the other(s) 
with out breaking where the connection appears to be from.

Grant. . . .
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

Re: dead router detection

Linux Advanced Routing and Traffic Control