Re: [LARTC] Solved: Using more than 1 Internet Line

Julian Anastasov <ja@xxxxxx> · Tue, 4 Dec 2001 12:57:26 +0200 (EET)

	Hello,

On Tue, 4 Dec 2001, Arthur van Leeuwen wrote:

> Okay, I've read both the nanohowto and the docs on Julian's patches by now.
> A few things to note: the nanohowto's information is good even without
> Julian's patches, although things will become trickier. One has to do ones
> own link-probing and rerouting from userland. That is very doable however,

	what do you mean with "rerouting from userland"?

> provided you have machines somewhere at the ISP's site that will answer to
> either pings ore traceroutes or somesuch, as you will need answers.

	BTW, only ARP answers are needed, they can't be filtered
with an excuse of ISP's policy. The ICMP replies are used from
the userland only (for monitoring). Of course, this is one solution,
as you said, one can do the same from user space by changing the
routes after a failover. Even from user space it can be faster in
some cases as the kernel do passive dead gateway (neighbour) detection,
not active.

>
> The patches Julian provided fix a bunch of nastiness. For one, dead gateway
> detection is done on the ARP level in kernelspace. Very neat when you have
> ARP, thus on ethernet, but not very useful without. Furthermore, they

	Right. But there are non-ARP device drivers that can alter
the device link state. For others, it is in the hands of a user program.
At least, the patches handle situations with such pointtopoint devices.

> provide true alternative routes, not only multipath default routes. This is
> once more extremely neat, but not directly necessary for the usual case.
>
> Thirdly, Julian's patches add gateways as a routing key. This will not help
> pure routing boxes, such as would be standard issue in an office full of
> Windows toasters, as the gateway will be determined at the routing stage, so
> it cannot be used as a key.

	The gateway as key has the purpose only for the NAT to select
the best source address according to data provided at routing time
about the path from the multipath route that is selected for this
packet. No other hosts depend on this. It is a way to select a source
IP for the link already selected at routing time when you are using
paths with same output device in the multipath route. We can
distinguish them only by outdev and gateway. The plain kernel
matches them only by device with the assumptions that the other variant
is insecure.

>
> The *main* reason to use Julian's patches is the masquerading connection
> rerouting. This will fix the big bugs in your setup by just redirecting a
> masqueraded connection out to a different interface when the old one is
> dead. This is *very* cool on UDP, and will make UDP failover to another
> route fully transparent.  However, it will not fix stateful protocols in

	It is no protocol specific. As for the established connections,
they are already bound to specific source and the number of variants are
equal to the number of links that serve this public IP (1 in most of
the setups), so, there is no place for failover for the masqueraded
connections in the usual setups. As for the plain usage of alternative
routes even TCP can start to use them except when the socket is
bound to device. In all other cases you can setup subnet over 2
devices and when one fails the TCP connections (with short gc_interval)
will switch to the other line (slooowly sometimes).

> which the server on the other side keeps state on the IP address it was
> talking to, such as SSH. It will fix the TOS nastiness OpenSSH brings to the

	Nobody changes IP addresses, once the connection is created
its traffic can go only through the allowed devices for the addresses.
The multipath route is used only from the new connections to select
different line for them. So, the remote end will not detect packets
with changed source. We simply can't send them through the wrong device.
The routes say so.

> fore, as it will *reroute* after masquerading. Bit of a hack, that. I
> simply nixed the TOS bits in the firewalling code. :)

	The changes in 2.2 are a big hack. But the masquerading there
will be not slower than the plain 2.2 kernel because the ip_route_output
call is still permanently there to select maddr which is already known.

> Summarizing: Yes, you can do equal cost multipath. Yes it is cool. Yes it
> can be made nicer and friendlier to set up using Julian's patches. However,
> it will not be an ideal solution. Things *will* break. Load will just be

	These patches fix small number of things. You can do more
of the work (even the same work) with unpatched kernel and some
scripts in user space. The problems come when the route cache entries
expire.

> approximately balanced. Failover is in most cases definitely not transparent
> to the user: new connections have to be set up. If the links stay up though,

	The failover simply depends on gc_interval (used from
the new connections only). As for the broken established connections:
nobody can resurrect them if they have only one line available for
the public IP addresses they use. If you can send traffic from one
public IP through 2 external devices and if the 1st fails, the other
will be used. Only the local connections bound to device will die, the
NAT and the unbound TCP sockets can use another alive device if there
are routes that say so.

> equal cost multipath is a *good* thing.
>
> Oh, and it does work on >2 uplinks. I've set up a system for a client using

	Yes, it is tested, for example, with 2 WANs and one ethernet
to route packets to another router with WAN. 2 routers, 3 WANs total.
In short, solution for end hosts and routers that use border gateways.
All other (VRRP, etc) solutions for failover are for intermediate
routers. You can't move the physical link from the failed box to
another in some of the cases.

> 1 ISDN line, 2 ADSL links (with the Dutch MXStream cruftiness, but I
> digress) and 1 cable modem using masquerading only on the last three, using
> the standard kernel (Julian's patches didn't exist a year ago, when I did
> this). Worked splendidly (and still does, I'm told). Needed some manual
> supervision though, as link failover and especially failback is *not*
> trivial.

	There are some things in these patches that are not for the
innocent Linux user: the need for active monitoring from user space
for the gateways in the multipath and the alternative routes. I have
an idea to do active monitoring in kernel but this is TODO. The
problem is in fact for the multipath routes: we need valid neighbour
state for these gateways (provided from ARP pings). If the ARP info
is not known we can end with using only part of the paths in the
multipath route and this bad, at least unexpected for the users
that do not feed the kernel with fresh ARP info. But in such situation
it seems they don't demand failover detection.

> Doei, Arthur.

Regards

--
Julian Anastasov <ja@xxxxxx>

Re: [LARTC] Solved: Using more than 1 Internet Line

Linux Advanced Routing and Traffic Control