Bugs in Linux 2.4 masq + nexthop? (was Re: A question about multipath routing...)

"'zblaxell@furryterror.org'" <zblaxell@furryterror.org> · Fri, 9 Nov 2001 12:21:15 -0500

CC'd to linux-net in case anyone there has any bright ideas ;-)

On Fri, Nov 09, 2001 at 11:29:36AM +0100, Ludek Hejrovsky wrote:
> In the posting you mention another problem  with outgoing masqueraded
> packets.
> I wonder how it can work at all - one of my ISPs filters out packets with
> foreign address. And even if he would not, believe that is not good for
> performance reasons, if packets belonging to single connection travel by
> different routes becouse of their order.

In general, what you want is for a masqueraded connection through a
gateway with a nexthop route is to keep the same IP address and the same
interface for each connection as long it is established, and as long as
the outgoing route is available.

Linux 2.2 masquerade almost worked--it got things right most of the time.
The only thing missing was some way to explicitly clear the masquerading
table from user-space, which would be really nice to do if you know for
a fact that your external IP changed, and you're not going to get the
old one back.

Linux 2.4 masquerade doesn't work properly at all with nexthop routes
(both IP's may appear on both interfaces, so 50% of the outgoing
connections go directly to the ISP's anti-smurf bit bucket).

The newer iptables DNAT stuff doesn't solve any of these problems
either--you can specify the external IP address explicitly, and pool
consecutive IP address ranges, but most dual-ISP setups don't have
consecutive IP's and if you don't explicitly specify an address with
DNAT then you get the same bugs as with masquerade.

> Anyway, if packets become disapear because of ISPs filtering does not
> introduce this big delay before nef connection is established? And how
> likely such change of route happen? I would suppose every second packet to
> go by different interface, which obviously does not happen.

With nexthop routes, route cache entries generally do not change during
the lifetime of the socket under most circumstances (exceptions include:
explicit cache flushes, extremely large routing loads, and protocols
that go for a long time (days?) between sending packets).  

Every time you connect to a given IP, you will always use the same route
for that IP until the cache is flushed.  You get load balancing when
about 50% of the destination IP addresses will use one interface and 50%
will use the other.  If you only connect to one IP address at a time
then you don't get load balancing, you only get dead gateway failover.

To give a concrete example:  if you go to a web page at www.foo.com
and download a bunch of images, all of the resulting traffic might go
through eth1.  If you then visit www.bar.com, all of the traffic might
go through eth2.  If you visit 1000 different web sites, 500 of them
will use eth1 and 500 will use eth2.  

Now the dead gateway detection:  if the ISP on eth1 fails (defined as
"the gateway on eth1 does not respond to ARP"), all connections through
that ISP will be lost--you can't receive packets because the ISP has
failed, and you can't send them through either eth1 or eth2 because the
other ISP will filter them.  New connections will avoid the dead gateway
and go through eth2.

I no longer use nexthop routing due to the bugs.  Instead I use a script
that runs every minute, pings the ISP's gateways, and changes the default
route whenever the currently selected gateway doesn't respond.  This is
mostly the same thing the kernel does, except there is no load balancing.
This doesn't really work with masquerade either--all connections are lost
whenever the gateway changes--but losing 100% of established connections
on rare occasions is better than not being able to connect to 50% of
the IP address space at all.

> could MARKing of packets in the iptables in some way help in this?
> ?? still confused

Not really...you'd have to find a way to assign marks to connections,
but ipchains only assigns marks to packets.

Perhaps what iptables needs is a 'mark with ((random integer mod A) + B)'
plugin, which would then tell the connection tracking system to put the
entire connection through one of the DNAT IP address pools and leave it 
there until the connection terminates.  Then you just put in one DNAT
rule for every interface, and you have a load-balancing multi-ISP DNAT
gateway.

Attachment:
pgp00030.pgp

Description: PGP signature