Re: Problems in Dead Gateway Detection / Failover - MultipleISP Links

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



gypsy wrote:
Manish Kathuria wrote:
--== snip ==--

 However, if there is a problem in the ISP connectivity at any of the
subsequent hops, there is no dead gateway detection and failover also
does not take place. I have tested this on various linux kernels from
2.4 as well as 2.6 series.

Somehow I have never faced a similar problem before and things have been
working perfectly. In real life situation here, the first hop gateway is
rarely going to be down so dead gateway detection and failover is going
to be required whenever there is some connectivity problem at any of the
later hops. So that's where dead gateway detection needs to work.

What could be the reason ? How can this be resolved ? I would appreciate
any pointers or suggestions.

Thanks,

Manish Kathuria


Manish,

Same here (a long time ago.  I no longer have multiple ISPs).

I don't have any answers for you, but here are a few pointers:

Thanks for your mail. I wil try out the suggestions given by you.


Use arping in a script, pinging the farthest hop that arping can reach
that is of interest.  Whenever arping returns a bad status, run 'ip
route flush cache'.  Put a nice long sleep in the script and run it all
the time. >
Perhaps in that same script, 'ping -n1 -I' each WAN interface in turn to
some destination that must always be up but reachable only by/on that
interface.  Run 'ip route flush cache' whenever that ping fails.

The only thing is whether by doing this the kernel would be able to mark the gateway having bad status as down or not. If it does not any other intervention, then its really superb.


You are just trying to detect the up or down status of the link, so
don't flood the connection with arping and ping packets.  Using sleep,
space those pings apart to something sensible.

I was thinking of writing a daemon which will ping a remote host through each of the WAN interfaces every 5 seconds. If one of them gives a bad status response continuosly for 8-10 times, the default route will be changed to the other ISP's gateway and if the status changes again, it will be restored back to the load balanced multipath state.

Will have to actually try and see which method fits in better here and is more elegant. If your suggestion works, its perhaps the best way out.


Although Julian has never confirmed (or denied) this, it was my
experience that only the **__FIRST__** nexhop affected the up or down
status of the connection.  If that succeeded, nothing would flag the
connection as dead.  If you know C, perhaps you can examine Julian's
kernel patch to see if there is any useful information there.  In my
opinion, Julian should document exactly how DGD works.  Perhaps he has
and I just can't find it on his web site, but (when I cared), I was not
able to find anything useful there.

There are excellent documents at http://www.ssi.bg/~ja/dgd-usage.txt and http://www.ssi.bg/~ja/nano.txt which have explained it very well. Quoting from the dgd-usage.txt document here ...


---Begin Quote---

* the alternative routes check the neighbour state not only for gateways
but  for hosts, i.e. for any kind of neighbours. Note that in some cases
the  neighbour  can remain  in reachable  state  while its  nexthops are
failed.   For example, it is even possible the gateway to be a proxy ARP
server  and the gateway IP to remain  always in reachable state. In such
case we can not notice the real state of the gateway's IP.

* the alternative routes can be a list from unipath or multipath routes,
using  NOARP  and  ARP devices.  As  result,  the first  alive  or first
suspected  (but not dead)  route is selected by  inspecting the state of
the gateways in each path or the neighbours through the used device from
the path.

* as  result we take care of the state of each path in a multipath route
and  we  try to  use  only the  alive  paths considering  their relative
weights

---End Quote---

In the current situaion I am dealing with, the firsthop gateway is always reachable. It is only the subsequent hops which can go down. And when that happens, the dead gateway detection doesnt work, the outgoing traffic keeps on going out through the dead ISP's WAN interface. But what confuses me is that DGD does work for one of the ISPs which is also identically connected.

Could running routed / gated play a role here in resolving this problem ?


Have you tried to engage Julian in a conversation to resolve this?  He
posts here occasionally but I do not know if he answers questions about
DGD off this list.

I have not done it so far.

--
gypsy


Thanks once again for your suggestions.

--
Manish Kathuria
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux