Hello, On Fri, 22 Mar 2002, Thomas Vander Stichele wrote: > Hm, OK. I checked the value on my system and it's set to 300, which IIRC > would mean 5 minutes (it's in seconds, right ?). The ssh connection set > up takes only about 2 seconds. Isn't it highly unlikely that the cache > would have a timeout in that interval, *reliably*, every time ? With the patched kernel it is 0 seconds: on the first packet. With unpatched kernel the multipath route usage takes long, the whole connection life. > OK, I used both iptraf and tcpdump to check what is going on, and I find > something very odd. Traffic does indeed go out on two interfaces, as > traceroutes and output of last on the boxes I ssh to show. But using both > tcpdump and iptraf, I only see non-tcp data going over eth2. None of the > tcp-connections show up on eth2. They do on eth1. I don't remember whether you have script that does active gateway monitoring because the patched kernel does only passive detection on route resolution, explained in nano.txt. I'm not sure whether only the first alive nexthop from the multipath route is used in your case. Check with "ip neigh" the status of the both gateways, they should be in reachable state. May be I'll tune soon the detection to treat more states as valid. Currently, only the gateways in reachable state are considered as valid. > The only difference I can see is that ifconfig shows eth2 to be > "UP BROADCAST NOTRAILERS RUNNING", while eth1 is "UP BROADCAST RUNNING > MULTICAST". Other than that, the total received traffic is roughly the > same on both interfaces (which is weird, since I still cannot connect from > the outside over eth2), while the transmitted data for eth1 is 10 times > larger than for eth2. I'm not sure if this is because somehow traffic > going out over eth2 is not "registered" right (as seen by tcpdump and > iptraf) or because of something else. I'm not that experienced using > tcpdump, so I can't tell. Also, I don't know enough about the lesser-used > output from ifconfig since I never needed it before ;) The old 2.4 kernels don't show correctly the IP addresses for the NAT-ed packets in tcpdump (copy-on-write problem) but the device should be correct. I'm not sure, may be 2.4.17 has these fixes. > Is there something that could explain this weird behaviour ? Or are there > some hints or guides to use tcpdump correctly in debugging this particular > problem ? It is enough to see the addresses. With a healthchecking script you should not see problems. > Thanks in advance, > Thomas Regards -- Julian Anastasov <ja@ssi.bg>