If anyone is interested,
in my quest for a networking solution which
provides IP Failover on heterogenous redundant networks, I have listed
the solutions I found below. I would welcome comments from anyone who
is familiar with these.
John Klingler Automatic IP Failover: faild
Figure 1 shows a typical redundant network configuration where all nodes are connected to two, separate Ethernet LANs (here referred to as Ethernet A and Ethernet B). Each node must have two Ethernet interfaces, one for each LAN. Distinct IP addresses are assigned to all Ethernet interfaces. _____________________ . . . A route monitor daemon is started on all nodes. Each daemon is configured to be either a responder or both a requestor and responder. Typically the host daemons are requestor/responders. Requestor daemons broadcast inquiry (INQ) packets on all available networks at a specified interval. Upon receiving an INQ each responder daemon sends back an acknowledgment (ACK) via the same route. These packets are all sent using UDP (Unreliable Datagram Protocol) so the daemons can quickly detect if a route is active. If the requestor daemon does not get ACKs from a given node and if the responder daemon does not get INQs as expected, then each daemon independently determines that the particular route has become unreliable, or more likely, has gone dead. Each daemon then changes its local system routing tables so future traffic will be routed over the alternate (and presumably healthy) LAN. This detection and failover occurs very quickly, in a matter of a few seconds, depending on how the daemon's timing parameters are set. When a route fails, network traffic carried by reliable protocols (such as X Window traffic via TCP -- Transmission Control Protocol) is held in abeyance until the IP stack recognizes that packets are not getting through. When the IP stack times out packets waiting for delivery will be retransmitted. Since the daemon has changed the routing tables the retransmitted packets will go via the new route. The IP time-out time is the critical parameter determining how long it will take from initial route failure to establishing communication over the new route. This parameter may or may not be user-settable on your system. Field experience so far indicates lag times of 20-40 seconds before communication resumes. As soon as the original route becomes reliable again, the daemon will restore the routing tables and communication resumes over the original interface. There should be no noticeable delay on the switchback. Request packet interval, failover interval, and switchback interval are all configurable. To initiate a failover daemon on your host system, use the following convention: faild [-r] [-t <n>] [-f <n>] [-s <n>] [-p <n>] [-l <p>]
|