Re: dead router detection

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/06/07 06:39, Guillermo Gómez (Gomix) wrote:
I would like to know what happens with a dead router in a multipath configuration like the one presented http://lartc.org/howto/lartc.rpdb.multiple-links.html

Do i need to monitor dead routers and reconfigure ?

Dead Gateway Detection (a.k.a. DGD) built in to stock Linux kernels will detect the death of immediately connected gateways. DGD will only work with gateways on the same subnet, not beyond other gateways. DGD running on 'Client' below will detect the death of 'Router A' or 'Router B' but not 'Router C' nor 'Router D'. For 'Client' to be aware of the death of 'Router C' or 'Router D' a routing protocol will need to be used.

                 +----------+       +----------+
             +---+ Router A +---+---+ Router C +---
+--------+   |   +----------+   |   +----------+
| Client +---+                  |
+--------+   |   +----------+   |   +----------+
             +---+ Router B +---+---+ Router D +---
                 +----------+       +----------+

DGD is used for the Linux kernel to detect when a given router is unreachable and to fail over to the next available route. For this to work 'Client' would have to have the following two routes in place.

route add default gw <Router A> metric <N>
route add default gw <Router B> metric <N>

DGD will detect the failure of one gateway (route) and fall back to the next available gateway (route).

One point of interest is that DGD purportedly only works with default routes, not routes to specific destinations. I have not personally used this so I can not say for sure.

I have tested the following scenario with stock Linux kernels and had success.

+-------------+                         +------------+
|  'A'   eth0 +---[Switch]---[Switch]---+ eth0   'B' |
| dummy0      |                         |      dummy0 |
|        eth1 +---[Switch]---[Switch]---+ eth1       |
+-------------+                         +------------+

I had two routes set up on each system that the network bound to the opposing systems dummy0 available via the opposing systems eth0 and eth1 interfaces. So each system had two routes to the opposing dummy0 network.

I ran pings from one systems dummy0 interface to the other systems dummy0 interface. I then disconnected the ethernet cable from one of the systems eth interfaces. With in 60 seconds the system that I did not disconnect the cable on would realize that the gateway was dead and drop back to the one remaining gateway.

If I plugged the ethernet cable back in and manually restored the config on the system that I unplugged the cable from (when the interface went down the kernel removed its configuration) the system would send traffic back to the other system using both interfaces.

So, say I unplugged the cable from eth0 on B, A would realize that the route that used B:eth0 as the gateway was dead and so A would stop using that route. B would know immediately that replies needed to to back to A over eth1 because it already knew that it could not reach eth0 on A because its interface was down.

Once I plugged the cable back in to eth0 on B and re-configured the IP address and routes back to A (again the kernel removed the interface config and routes when it saw the physical link was dead) B immediately started using both routes again. A allowed the traffic to come back in eth0 while still sending the traffic out eth1. After about 45 - 60 seconds of live traffic on eth0 the kernel on A decided that the gateway was back alive and started using the route again.

When I ran this test I was trying to make sure that A would work with out any regard to B. B was under someone else's control and as such not my worry how it behaved. I found that A would detect the lack of the ability to reach B via one route or the other and start using the remaining route(s) as it should.

I did not need to run any sort of monitoring of traffic on either eth0 or eth1 on A because I was able to rely on incoming traffic from both routes to increment the kernel's packet counters that were used by the DGD algorithm. However if I was implementing both sides of this situation I would have needed to periodically do something like an ARPing to both eth0 and eth1 on B to make sure that they were both alive. More specifically, I would need to ARPing to see if the routes were resurrected. The kernel would watch packet counters to see when a route died. However when the route died, there would be no normal traffic to start incrementing the counters when the route came back to life. Thus I would need to create the traffic via ARPing.

There is another issue that you need to be aware of with what "Routing for multiple uplinks/providers". Namely when you use "Routing for multiple uplinks/providers" you have multiple external IP addresses that systems see you coming from. When you are coming from multiple external IP addresses, you can not shift traffic from one route to the other(s) with out breaking where the connection appears to be from.



Grant. . . .
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux