Re: Route fallback issue

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/21/2018 01:57 PM, Julian Anastasov wrote:
Hello,

Hi.

I think so

Okay.

I'll do some more digging.

You can search on net. I have some old docs on these issues, they should be actual:

http://ja.ssi.bg/dgd-usage.txt

"DGD" or "Dead Gateway Detection" sounds very familiar. I referenced it in an earlier reply.

I distinctly remember DGD not behaving satisfactorily years ago. Where unsatisfactorily was something like 90 seconds (or more) to recover. Which actually matches what I was getting without the ignore_routes_with_linkdown=1 setting that David A. mentioned.

With ignore_routes_with_linkdown=1 things behaved much better.

Not true. net/ipv4/fib_semantics.c:fib_select_path() calls fib_select_default() only when prefixlen = 0 (default route).

Okay.... My testing last night disagrees with you. Specifically, I was able to add a alternate routes to the same prefix, 192.0.2.128/26. There was not any default gateway configured on any of the NetNSs. So everything was using routes for locally attacked or the two added via "ip route append".

What am I misinterpreting? Or where are we otherwise talking past each other?

Otherwise, only the first route will be considered.

"only the first route" almost sounds like something akin to Equal Cost Multi Path.

I was not expecting "alternative routes" to use more than one route at a time, equally or otherwise. I was wanting for the kernel to fall back to an alternate route / gateway / path in the event that the one that was being used became unusable / unreachable.

So what should "Alternative Routes" do? How does this compare / contract to E.C.M.P. or D.G.D.

fib_select_default() is the function that decides which nexthop is reachable and whether to contact it. It uses the ARP state via fib_detect_death(). That is all code that is behind this feature called "alternative routes": the kernel selects one based on nexthop's ARP state.

Please confirm that you aren't entering / referring to E.C.M.P. territory when you say "nexthop". I think that you are not, but I want to ask and be sure, particularly seeing as how things are very closely related.

It sounds like you're referring to literally the router that is the next hop in the path. I.e. the device on the other end of the wire.

I'll have to find, read, and try to grok the code to have a better idea. That being said, it looks like (based on the name) that fib_select_default() deals with the default route. The testing I did last night, and positive results, indicate that the kernel did what I wanted it to do. (See above about D.G.D. vs E.C.M.P.)

So, it seems as if something about alternative routes worked using non-default routes. I have no way of knowing if it was the code that we're talking about, or something else that produced the results. Given the way I did the test (specific prefixes, non-default, routes being appended with no other routes) worked the way that I would have thought that a feature that uses alternative routes (or historically D.G.D.) would have worked.

The following ping works just fine as I bounce interfaces on NS1.

ns2# ping -I 192.0.2.254 192.0.2.129

I can confirm that traffic is moving back and forth between the vEth links between the NetNSs. Granted, the traffic sticks to one vEth interface until it goes away.

I can shut down ns2a on NS1 so that ns1a sees loss of link but but stays up on NS2, and traffic moves to vEth-B.

I can then open up ns2a on NS1 so that ns1a sees link on NS2, and re-append the route on NS1.

I can then shut down ns2b on NS1 so that ns1b sees loss of link but stays up on NS2, and traffic moves to vEth-A.

I can then open up ns2b on NS1 so that ns1b sees link on NS2, and re-append the route on NS1.

NS2 behaves exactly as I would hope. Traffic will move from the down interface to the remaining up interface. Back and forth, no problem.

I don't know where the disconnect is, but I feel like there is one.

Routes with different metric are considered only when the routes with lower metric are removed.

I agree with the statement. What I question is where metric came into play here. All of the routes had the same (default) metric. None of the routes I tested had different metrics.

ns1# ip route show
192.0.2.0/26 dev ns2a proto kernel scope link src 192.0.2.1
192.0.2.64/26 dev ns2b proto kernel scope link src 192.0.2.65
192.0.2.128/26 dev dummy0 proto kernel scope link src 192.0.2.129
192.0.2.192/26 via 192.0.2.62 dev ns2a
192.0.2.192/26 via 192.0.2.126 dev ns2b

ns2# ip route show
192.0.2.0/26 dev ns1a proto kernel scope link src 192.0.2.62
192.0.2.64/26 dev ns1b proto kernel scope link src 192.0.2.126
192.0.2.128/26 via 192.0.2.65 dev ns1b
192.0.2.128/26 via 192.0.2.1 dev ns1a
192.0.2.192/26 dev dummy0 proto kernel scope link src 192.0.2.254

IIRC, this flag invalidates nexthops depending on the link state. If your link is always UP it does not help much.

That's what I gathered. So things like DSL & cable modems or other L2 bridging devices might not drop the link when their circuit drops.

This is also why I asked the follow up questions to David's email.

I want to do some testing to see if fib_multipath_use_neigh alters this behavior at all. I'm hoping that it will invalidate an alternate route if the MAC is not resolvable even if the physical link stays up.

Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering this behavior. But having that timeout and starting to use an alternative route is considerably better than not using an alternative route.

If you rely on user space tool, you can check the state of the desired hops: device link state, your gateway to ISP, one or more gateways in the ISP network which you consider permanent part of the path via this ISP.

This is what I have thought about doing previously.

First route can be created with 'add' but all next alternative routes can be added only with "append". If you successfully add them with "add" it means they are not alternatives to the first one, they are not considered at all.

ACK



--
Grant. . . .
unix || die

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux