Answer inlined: Salim S I wrote: > iptables -t mangle -A PREROUTING -j ISP2 > > Doesn't it need to check for state NEW? Or packets will not reach the > restore-mark rule. Of course, and the real script does check. I typed this line manually because the copy cut it, and missed the obvious check. > You may have to manually populate the routing tables when an interface > comes up, after being down for some time. (Kernel would have removed the > routing entries for this interface after it found the interface down. > This happens only if its nexthop is down) This is what I can't really understand (and it applies to DGD as well) - how often in real life does someone yank a cable out, so an interface will go down? In over 7 years of dealing with various ISPs I have never seen the link go so dead, that the kernel will down the interface and remove all associated routing information. What I have seen on the other hand is the link dying at the 2nd or 3rd hop, which (if I understand correctly) DGD simply can not detect. Correct me if my assumption is wrong. > I tend to favor this approach, because it is more flexible in selecting > the interface. You can use different weights/probability depending on > different factors. I have seen a variation of this method, used with > 'recent' (-m recent) match, instead of CONNMARK. I see. But recent would have a "caching effect", and from what I understand is heavier on the kernel, unlike the CONNMARK which hooks into the conntrack which in turn has to track connections either way. > The only downside in using this method, as far as I can see, is the need > to reconfigure rules and routing tables, in case of a failure/coming-up. > But lately, I have found that even with multipath method, there IS a > need for reconfiguration. Got you. This pretty much answers my original question. Thank you for your time. > -----Original Message----- > From: lartc-bounces@xxxxxxxxxxxxxxx > [mailto:lartc-bounces@xxxxxxxxxxxxxxx] On Behalf Of Peter Rabbitson > Sent: Monday, May 14, 2007 3:16 PM > To: lartc@xxxxxxxxxxxxxxx > Subject: Re: Multihome load balancing - kernel vs netfilter > > Salim S I wrote: >>> -----Original Message----- >>> From: lartc-bounces@xxxxxxxxxxxxxxx >>> [mailto:lartc-bounces@xxxxxxxxxxxxxxx] On Behalf Of Peter Rabbitson >>> Sent: Monday, May 14, 2007 1:57 PM >>> To: lartc@xxxxxxxxxxxxxxx >>> Subject: Multihome load balancing - kernel vs netfilter >>> >>> Hi, >>> I have searched the archives on the topic, and it seems that the list >>> gurus favor load balancing to be done in the kernel as opposed to > other >>> means. I have been using a home-grown approach, which splits traffic >>> based on `-m statistic --mode random --probability X`, then CONNMARKs >>> the individual connections and the kernel happily routes them. I >>> understand that for > 2 links it will become impractical to calculate > a >>> correct X. But if we only have 2 gateways to the internet - are there >>> any advantages in letting the kernel multipath scheduler do the >>> balancing (with all the downsides of route caching), as opposed to > the >>> pure random approach described above? >> I have thought about this approach, but, I think, this approach does > not >> handle failover/dead-gateway-detection well. Because you need to alter >> all your netfilter routing rules if you find a link down. And then >> reconfigure again when the link comes up. I am interested to know how >> you handle that. >> > > Certainly. What I am doing is NATing a large company network, which gets > load balanced and receives fail over protection. I also have a number of > services running on the router which must not be balanced nor failed > over, as they are expected to respond on a specific IP only. All > remaining traffic on the server itself is not balanced but fails over > when the designated primary link goes down. > > I start with a simple pinger app, that pings several well known remote > sites once a minute using a large icmp packet (1k of payload). The rtt > times are averaged out and are used to calculate the current "quality" > of the link (the large packet makes congestion a visible factor). If one > of the interface responses is 0 (meaning not a single one of the pinged > hosts has responded) - the link is dead. > > In iproute I have two separate tables, each using one of the links as > default gw, matching a certain mark. The default route is set to a > single gateway (not a multipath), either by hardcoding, or by using the > first input of the pinger (it can run without a default gw set, > explanation follows) > > In iptables I have two user defined chains: > iptables -t mangle -A ISP1 -j CONNMARK --set-mark 11 > iptables -t mangle -A ISP1 -j MARK --set-mark 11 > iptables -t mangle -A ISP1 -j ACCEPT > > iptables -t mangle -A ISP2 -j CONNMARK --set-mark 12 > iptables -t mangle -A ISP2 -j MARK --set-mark 12 > iptables -t mangle -A ISP2 -j ACCEPT > > The rules that reference those chains are: > > For all locally originating traffic: > iptables -t mangle -A OUTPUT -o $I1 -j ISP1 > iptables -t mangle -A OUTPUT -o $I2 -j ISP2 > > For all incoming traffic from the internet: > iptables -t mangle -A PREROUTING -i $I1 -m state --state NEW -j ISP1 > iptables -t mangle -A PREROUTING -i $I2 -m state --state NEW -j ISP2 > > For all other traffic (nat) > iptables -t mangle -A PREROUTING -m state --state NEW -m statistic > --mode random --probability $X -j ISP1 > iptables -t mangle -A PREROUTING -j ISP2 > > At the end of the PREROUTING cain I have > iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark > > The NATing is trivially solved by: > iptables -t nat -A POSTROUTING -s 10.0.58.0/24 -j SOURCE_NAT > iptables -t nat -A POSTROUTING -s 192.168.58.0/24 -j SOURCE_NAT > iptables -t nat -A POSTROUTING -s 192.168.8.0/24 -j SOURCE_NAT > > iptables -t nat -A SOURCE_NAT -o $I1 -j SNAT --to $I1_IP > iptables -t nat -A SOURCE_NAT -o $I2 -j SNAT --to $I2_IP > > > What does this achieve: > * Local applications that have explicitly requested a specific IP to > bind to, will be routed over the corresponding interface and will stay > that way. Only applications binding to 0.0.0.0 will be routed by > consulting the default route. > * Responses to connections from the internet are guaranteed to leave > from the same interface they came in. > * All new connection not coming from the external interfaces are load > balanced by the weight of $X, and are again guaranteed to stay there for > the life of the connection, but another connection to the same host is > not guaranteed to go over the same link. This is important in a company > environment, since most employees use the same online resources. > > On every run of the pinger I do the following: > * If both gateways are alive I replace the -m statistic rule, adjusting > the value of $X > * If one is detected dead, I adjust the probability accordingly (or > alternatively remove the statistic match altogether), and change the > default gateway if it is the one that failed. > > So really the whole exercise revolves around changing a single rule (or > two rules, if you want to control the probability in a more fine-grained > way). > > Last but not least this setup allowed me to program exception tables for > certain IP blocks. For instance Yahoo has a braindead two tier > authentication system for commercial solutions. It remembers the IP > which you used to login with first, and it must match the IP used to > login to a more secure area (using another password). Or users from > within the lan might want to use one of the ISPs SMTP servers, which > keeps a close eye on who is talking to it. So I have a $PREFERRED which > is adjusted to either ISP1 or ISP2, depending on the current state of > affairs, and rules like: > iptables -t mangle -A PREROUTING -d 66.218.64.0/19 -m state --state > NEW -j $PREFERRED > iptables -t mangle -A PREROUTING -d 68.142.192.0/18 -m state --state > NEW -j $PREFERRED > > This pretty much sums it up. The only downside I can think of is that > loss of service can be observed between two runs of the pinger. Let me > know if I missed something be it critical or minor. > > Thanks > > Peter > _______________________________________________ > LARTC mailing list > LARTC@xxxxxxxxxxxxxxx > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc > > > _______________________________________________ > LARTC mailing list > LARTC@xxxxxxxxxxxxxxx > http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc _______________________________________________ LARTC mailing list LARTC@xxxxxxxxxxxxxxx http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc