Re: IP Failover

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If anyone is interested, in my quest for a networking solution which provides IP Failover on heterogenous redundant networks, I have listed the solutions I found below. I would welcome comments from anyone who is familiar with these.
  1. faild - I have included a description below of a program daemon which monitors the Ethernet connections and changes the routing tables when a failure is detected. IP Failover is all this simple program does. Being simple, however, makes it small and easy to port.
  2. High Availability Linux Project (HAL) (http://linux-ha.org/) has code available for FreeBsd and Solaris (and probably reasonably portably to other UNIX platforms. It supports virtual (redundant) servers but could probably therefore be configured to support redundant LANs.
  3. Advanced Network Services (ANS 2.3.x) for Linux* Operating Systems.  which is available from Intel on both PCs and UNIX OS's. ANS provides IP Failover and much more, such as switch failover, load leveling, etc. See: http://www.intel.com/support/network/adapter/onlineguide/PRO1000/DOCS/SERVER/index.htm.
  4. Linux Virtual Server Project (LVS) - VRRPD, Virtual Router Redundancy Protocol (http://off.net/~jme/vrrpd/) which also provides IP Failover. It implements RFC2338 but is only available on Linux but may be portable. As with HAL, it is probably configureable to provide redundant LAN.
It seems the days of industry-wide standards and interoperability are becoming casualties of war.


John Klingler
Automatic IP Failover: faild

Figure 1 shows a typical redundant network configuration where all nodes are connected to two, separate Ethernet LANs (here referred to as Ethernet A and Ethernet B). Each node must have two Ethernet interfaces, one for each LAN. Distinct IP addresses are assigned to all Ethernet interfaces.
_____________________ . . .
        |                     |
    Host 1           Host 2
____|________ __|______ . . .
Figure 1: Typical Redundant Network Configuration

A route monitor daemon is started on all nodes. Each daemon is configured to be either a responder or both a requestor and responder. Typically the host daemons are requestor/responders.

Requestor daemons broadcast inquiry (INQ) packets on all available networks at a specified interval. Upon receiving an INQ each responder daemon sends back an acknowledgment (ACK) via the same route. These packets are all sent using UDP (Unreliable Datagram Protocol) so the daemons can quickly detect if a route is active.

If the requestor daemon does not get ACKs from a given node and if the responder daemon does not get INQs as expected, then each daemon independently determines that the particular route has become unreliable, or more likely, has gone dead. Each daemon then changes its local system routing tables so future traffic will be routed over the alternate (and presumably healthy) LAN. This detection and failover occurs very quickly, in a matter of a few seconds, depending on how the daemon's timing parameters are set.

When a route fails, network traffic carried by reliable protocols (such as X Window traffic via TCP -- Transmission Control Protocol) is held in abeyance until the IP stack recognizes that packets are not getting through. When the IP stack times out packets waiting for delivery will be retransmitted. Since the daemon has changed the routing tables the retransmitted packets will go via the new route.

The IP time-out time is the critical parameter determining how long it will take from initial route failure to establishing communication over the new route. This parameter may or may not be user-settable on your system. Field experience so far indicates lag times of 20-40 seconds before communication resumes.

As soon as the original route becomes reliable again, the daemon will restore the routing tables and communication resumes over the original interface. There should be no noticeable delay on the switchback. Request packet interval, failover interval, and switchback interval are all configurable.

To initiate a failover daemon on your host system, use the following convention:
faild [-r] [-t <n>] [-f <n>] [-s <n>] [-p <n>] [-l <p>]
-r should launch requestor
-t <n> : timer interval (in secs) for sending of pkts
-f <n> : num missed pkts before if is invalidated
-s <n> : num good pkts before if is revalidated
-p <n> : port number to use -l <p> : full path to message log file
  • Note: This daemon currently runs on VxWorks, Digital UNIX and Solaris, and is being ported to OpenVMS. Any other platforms would require porting the daemon to the target OS.

[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux