The problem is, everything is routing fine, and the data is being split evenly over eth0 and eth1, but as soon as I pull the cable out of eth0 (pulling it out of eth1 doesn't seem to matter) the connection goes out and the routes never recover until I plug the cable back in (at which point things start flowing perfectly again without any prompting from me). On the other hand, if I ifdown eth0, the routes switch over silently. As soon as I bring eth0 back up, data's going over both eth0 and eth1 again.
In other words, things are working almost exactly as they should be, but when the cat5 comes out, things just die. Someone suggested that I use mii tools and just ifdown eth0 if it's out, and that might work, but I'd really rather have a solution done solely within routing tables if possible.
The other reason I want to do this from the routing tables is because I expect any problems to be further down the line than the cable into the firewall.
The network will be set up like this:
intranet eth2 --- firewall --- eth0 --- router1 --- internet \-- eth1 --- router2 --- internet
When the connection from router1 to the internet goes down, I need the firewall to stop sending data over eth0 and commit fully to eth1. When that link comes back up, I need the routes restored. Same for the other way around.
The way I was thinking of doing this was by sending out an ICMP packet (say, to google.com) over each interface with a TTL of 3, and if it didn't come back, change the route.
But both the nano howto and the dead gateway detection howto seem to say that the routes as I have them (and you put them) should be able to handle this problem already. My problem is that it obviously doesn't. If it did, pulling the cable out of eth0 wouldn't cause such an issue.
So I guess what I'm asking is, does anyone have any suggestions about how to troubleshoot this problem?
Thanks so much everyone, Seth
Robert Kurjata wrote:
Hi Seth,
I cant find anything more than posting my working script for load balancing over two links (it was for three links and I home I didn't remove too much). It has been done strictly by the rules on Nano-HOWTO and works. The main part is the PING section at the end. This ensures that kernel sees dead gateways and recovers. But of course it WILL NOT work without some kernel patching (dead gateway detection, static routes - just use a Jumbo Patch from http://www.ssi.bg/~ja/ ).
A final word is: the routers didn't even have to respond to pings. They need to respond to ARPS. This stuff doesn't work properly for PPP or PPPoE connections as they usually are NoARP.
I also have some shaping done with TC/CBQ on both links.
VERY IMPORTANT: all the testing is USELESS if you have less than 40-50 users doing lots of requests to different sites as a routes are just cached in kernel. In my system even with 10-20 users balancing is usually poor improving greatly with number of users - the diference between links lowers down to 10%.
Hopefully I will get some free time to write a step-by-step howto because it took me some time to understand the thing.
Home this helped someone, Greetings to the list ---------------------------cut here------------------------------------------
#!/bin/bash # This script is done by : Robert Kurjata Sep, 2003. # feel free to use it in any usefull way
# CONFIGURATION IP=/sbin/ip PING=/bin/ping
#--------------- LINK PART ----------------- # EXTIFn - interface name # EXTIPn - outgoing IP # EXTMn - netmask length (bits) # EXTGWn - outgoing gateway #-------------------------------------------
# LINK 1 EXTIF1=eth2 EXTIP1= EXTM1= EXTGW1=
# LINK 2 EXTIF2=eth1 EXTIP2= EXTM2= EXTGW2=
#ROUTING PART # removing old rules and routes
echo "removing old rules" ${IP} rule del prio 50 table main ${IP} rule del prio 201 from ${EXTIP1}/${EXTM1} table 201 ${IP} rule del prio 202 from ${EXTIP2}/${EXTM2} table 202 ${IP} rule del prio 221 table 221 echo "flushing tables" ${IP} route flush table 201 ${IP} route flush table 202 ${IP} route flush table 221 echo "removing tables" ${IP} route del table 201 ${IP} route del table 202 ${IP} route del table 221
# setting new rules echo "Setting new routing rules"
# main table w/o default gateway here ${IP} rule add prio 50 table main ${IP} route del default table main
# identified routes here ${IP} rule add prio 201 from ${EXTIP1}/${EXTM1} table 201 ${IP} rule add prio 202 from ${EXTIP2}/${EXTM2} table 202
${IP} route add default via ${EXTGW1} dev ${EXTIF1} src ${EXTIP1} proto static table 201 ${IP} route append prohibit default table 201 metric 1 proto static
${IP} route add default via ${EXTGW2} dev ${EXTIF2} src ${EXTIP2} proto static table 202 ${IP} route append prohibit default table 202 metric 1 proto static
# mutipath ${IP} rule add prio 221 table 221
${IP} route add default table 221 proto static \ nexthop via ${EXTGW1} dev ${EXTIF1} weight 2\ nexthop via ${EXTGW2} dev ${EXTIF2} weight 3
${IP} route flush cache
while : ; do ${PING} -c 1 ${EXTGW1} ${PING} -c 1 ${EXTGW2} sleep 60 done
---------------------------cut here------------------------------------------
_______________________________________________ LARTC mailing list / LARTC@xxxxxxxxxxxxxxx http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/