RE: iproute2 does not select 1st default route in table?

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm getting confused now. I did some online reading, and I've found the following:

If several routes match the packet, the following pruning rules are used to select the best one:

   1. The longest matching prefix is selected; all shorter ones are dropped.
   2. If the TOS of some route with the longest prefix is equal to the TOS of the packet, routes with different TOS are dropped.
   3. If no exact TOS match is found and routes with TOS=0 exist, the rest of the routes are pruned. Otherwise the route lookup fails.
   4. If several routes remain after steps 1-3 have been tried, then routes with the best preference value are selected.
   5. If several routes still exist, then the first of them is selected.

http://www.softpanorama.org/Net/Internet_layer/Routing/policy_routing.shtml

I'm having a hard time finding anything more official that states this, and without looking at the kernel source code I can't be 100% certain, but it seems that the first route listed should be selected, when there are multiple matches.

Here's something else to try, run the command "ip route show cache" and paste the contents here. As the cache is the first thing queried before looking at your routing list, I'm curious to see what it shows.

Joel Gerber
Network Specialist
Network Operations
Eastlink
E: Joel.Gerber@xxxxxxxxxxxxxxxx T: 519.786.1241
-----Original Message-----
From: Ole Craig [mailto:olc@xxxxxxxxxxxxxxxxxxx]
Sent: May-09-14 4:14 PM
To: Joel Gerber
Cc: lartc@xxxxxxxxxxxxxxx
Subject: Re: iproute2 does not select 1st default route in table?

Joel, et al -
        TL;DR: behavior at issue differs between 2 boxes which are 1 IP
        address apart with identical hw/sw loads; same kernel; interface
        numbering unchanged after kernel probing. What gives? What am I
        missing?

(Apologies for the delayed response, life around here has been frantic
and this has been sitting around half-composed in my mail client.)

Details:
    Ok, I have two dual-interface boxes sitting next to each other on
this customer's network. The first box is the one we've been discussing,
I'll call him "Deviant" from now on. The second box (hereinafter "Norm")
is acting just like the rest of the appliances in our fleet, i.e.
routing out eth1 by default. Hoping that comparing these two (which are
identical WRT both hardware and software config*) can lead to a more
informed/precise conclusion as to what's causing them to behave
differently (and maybe some ideas for re-norming Deviant; arbitrary
routing is Not My Kink.:)

kernel insertion order on Deviant:
        [root@dvnt ~]# uname -a
        Linux dvnt.example.com 2.6.18-348.4.1.el5 #1 SMP Tue Apr 16 15:40:06 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
        [root@dvnt ~]# dmesg | egrep 'eth.: \('
        igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4)
        00:25:90:30:28:6c
        igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4)
        00:25:90:30:28:6d
device<-->address mapping on Deviant:
        [root@dvnt ~]# ip link show
        1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
            link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
        pfifo_fast qlen 1000
            link/ether 00:25:90:30:28:6c brd ff:ff:ff:ff:ff:ff
        3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
        pfifo_fast qlen 1000
            link/ether 00:25:90:30:28:6d brd ff:ff:ff:ff:ff:ff
Link status on Deviant:
        [root@dvnt ~]# for i in 0 1; do echo -n "eth$i: "; ethtool eth$i | grep Link; done
        eth0:   Link detected: yes
        eth1:   Link detected: yes


kernel insertion order on Norm:
        [root@norm ~]# uname -a
        Linux norm.example.com 2.6.18-348.4.1.el5 #1 SMP Tue Apr 16 15:40:06 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
        [root@norm ~]# dmesg | egrep 'eth.: \('
        igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:25:90:30:2c:f8
        igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:25:90:30:2c:f9
device<-->address map on Norm:
        [root@norm ~]# ip link show
        1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
            link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
            link/ether 00:25:90:30:2c:f8 brd ff:ff:ff:ff:ff:ff
        3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
            link/ether 00:25:90:30:2c:f9 brd ff:ff:ff:ff:ff:ff
Link status on Norm:
        [root@norm ~]# for i in 0 1; do echo -n "eth$i: "; ethtool eth$i | grep Link; done
        eth0:   Link detected: yes
        eth1:   Link detected: yes


So, we can see that in both cases the device ordering is unchanged from
that which was discovered at boot by the kernel, and that eth0 was
probed first.

And here again for posterity is the active routing table and results of
"ip route get" on each machine:

Deviant:
        [root@dvnt ~]# ip route show table main
        10.1.1.1 dev tun0  proto kernel  scope link  src 10.1.1.7
        10.250.32.1 via 10.1.1.1 dev tun0
        10.250.10.0/24 via 10.1.1.1 dev tun0
        192.168.72.0/24 dev eth0  proto kernel  scope link  src 192.168.72.124
        192.168.72.0/24 dev eth1  proto kernel  scope link  src 192.168.72.100
        default via 192.168.72.1 dev eth1
        default via 192.168.72.1 dev eth0
        [root@dvnt ~]# ip route get 8.8.8.8 | sanize
        8.8.8.8 via 192.168.72.1 dev eth0  src 192.168.72.124
            cache  mtu 1500 advmss 1460 hoplimit 64

Norm:
        [root@norm ~]# ip route show table main
        10.1.1.1 dev tun0  proto kernel  scope link  src 10.1.1.45
        10.250.32.1 via 10.1.1.1 dev tun0
        10.250.10.0/24 via 10.1.1.1 dev tun0
        192.168.72.0/24 dev eth0  proto kernel  scope link  src 192.168.72.123
        192.168.72.0/24 dev eth1  proto kernel  scope link  src 192.168.72.99
        default via 192.168.72.1 dev eth1
        default via 192.168.72.1 dev eth0
        [root@norm ~]# ip route get 8.8.8.8
        8.8.8.8 via 192.168.72.1 dev eth1  src 192.168.72.99
            cache  mtu 1500 advmss 1460 hoplimit 64


Any thoughts as to what else might be going on?

Ole

*"identical[...]hw/sw config" -- granted there are differences in
application configuration (layers 5-7) but nothing that should come
close to affecting routing.
--
Ole Craig <olc@xxxxxxxxxxxxxxxxxxx>

On Wed, 2014-04-16 at 16:51 -0300, Joel Gerber wrote:
> Are you running different versions of the Linux kernel?
>
> I'm quite certain that the order the routes are added should have no weight on this matter. There is also another possibility. You might be renaming the interfaces differently that the order that they are loaded into the kernel as. Whichever interface was loaded first will have the smaller internal number, which is the one that would get selected first.
>
> Joel Gerber
> Network Specialist
> Network Operations
> Eastlink
> E: Joel.Gerber@xxxxxxxxxxxxxxxx T: 519.786.1241
>
> -----Original Message-----
> From: Ole Craig [mailto:olc@xxxxxxxxxxxxxxxxxxx]
> Sent: April-16-14 11:08 AM
> To: Joel Gerber
> Cc: lartc@xxxxxxxxxxxxxxx
> Subject: RE: iproute2 does not select 1st default route in table?
>
> Hello Joel,
> There must be something else in play; we manage hundreds of other similar appliances and this is the only one that's exhibiting this behavior. ("similar" == "same hardware spec, same software spec, only the IPs are different" -- and yes, most of them have eth0 and eth1 sharing a subnet and gw, and eth1 is always the default route for packets not built with a specific source IP.)
>
> Thanks,
>     Ole
>
> Joel Gerber <Joel.Gerber@xxxxxxxxxxxxxxxx> wrote:
>
> >Hello Ole;
> >
> >If you have multiple routes set with the same metric to the same destination, by default, the Linux kernel will choose the lowest numbered interface to use. This would be why eth0 is being hit every time. It has nothing to do with your ip rule definitions. If you wanted to prefer eth1, add a larger metric flag to the eth0 route, which will cause eth1 to be used instead. Then, only in the event that eth1 is link-down, will eth0 be used.
> >
> >You could also setup ECMP (Equal Cost Multi Path) which would allow you to load-balance traffic across both interfaces. To do this, add the equalize option to your 2 default routes when adding them. You might need to verify that you have multipath support baked into your kernel first.
> >
> >Joel Gerber
> >Network Specialist
> >Network Operations
> >Eastlink
> >E: Joel.Gerber@xxxxxxxxxxxxxxxx T: 519.786.1241
> >
> >-----Original Message-----
> >From: lartc-owner@xxxxxxxxxxxxxxx [mailto:lartc-owner@xxxxxxxxxxxxxxx] On Behalf Of Ole Craig
> >Sent: April-16-14 7:34 AM
> >To: lartc@xxxxxxxxxxxxxxx
> >Subject: iproute2 does not select 1st default route in table?
> >
> >I am having the damnedest time with a dual-IP CentOS5-based appliance which seems to defy its iproute2 configuration, and I'm here hoping someone smarter than me (admittedly not a high bar) might be so kind as to tell me what I'm missing:
> >        # ip route show table main
> >        10.1.1.1 dev tun0  proto kernel  scope link  src 10.1.1.7
> >        10.250.32.1 via 10.1.1.1 dev tun0
> >        10.250.10.0/24 via 10.1.1.1 dev tun0
> >        192.168.25.0/24 dev eth0  proto kernel  scope link  src 192.168.25.124
> >        192.168.25.0/24 dev eth1  proto kernel  scope link  src 192.168.25.100
> >        default via 192.168.25.1 dev eth1
> >        default via 192.168.25.1 dev eth0
> >        # ip route get 8.8.8.8
> >        8.8.8.8 via 192.168.25.1 dev eth0  src 192.168.25.124
> >            cache  mtu 1500 advmss 1460 hoplimit 64
> >
> >I have been playing with this box for several days, and no matter what I do (including reboots) it wants to route almost everything out eth0 instead of eth1, despite the ordering of default routes shown above which should see eth1 taking precedence. No iptables PREROUTING/nat/mangle/raw stuff, this is all straight iproute2.
> >
> >I determined that the 'main' table was the one at issue by inserting and removing an overriding rule at various priorities to see when 'ip get'
> >would change behavior:
> >        # ip rule show | tail
> >        32758:       from all to 192.168.72.0/24 lookup defroutes
> >        32759:       from 192.168.72.0/24 lookup defroutes
> >        32760:       from 192.168.72.0/24 lookup mgtroutes
> >        32766:       from all lookup main
> >        32767:       from all lookup default
> >        # ip route add table custom to 8.8.8.0/24 via 192.168.72.1 dev eth1 src 192.168.72.100
> >        # for i in $(seq 32767 -1 32764); do ip rule add prio $i to 8.8.8.0/24 lookup custom; ip route flush cache; sleep 2; echo -en "$i:\t"; ip route get 8.8.8.8| head -1; ip rule del prio $i to 8.8.8.0/24 lookup custom; ip route flush cache; sleep 2; done
> >        32767:       8.8.8.8 via 192.168.72.1 dev eth0  src 192.168.72.124
> >        32766:       8.8.8.8 via 192.168.72.1 dev eth0  src 192.168.72.124
> >        32765:       8.8.8.8 via 192.168.72.1 dev eth1  src 192.168.72.100
> >        32764:       8.8.8.8 via 192.168.72.1 dev eth1  src 192.168.72.100
> >
> >Both interfaces are up, and are *able* to reach 8.8.8.8 via the upstream
> >gw:
> >        # ping -c 1 -I eth0 8.8.8.8
> >        PING 8.8.8.8 (8.8.8.8) from 192.168.72.124 eth0: 56(84) bytes of data.
> >        64 bytes from 8.8.8.8: icmp_seq=1 ttl=47 time=14.9 ms
> >
> >        --- 8.8.8.8 ping statistics ---
> >        1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >        rtt min/avg/max/mdev = 14.990/14.990/14.990/0.000 ms
> >        # ping -c 1 -I eth1 8.8.8.8
> >        PING 8.8.8.8 (8.8.8.8) from 192.168.72.100 eth1: 56(84) bytes of data.
> >        64 bytes from 8.8.8.8: icmp_seq=1 ttl=47 time=14.9 ms
> >
> >        --- 8.8.8.8 ping statistics ---
> >        1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >        rtt min/avg/max/mdev = 14.999/14.999/14.999/0.000 ms
> >
> >Can anyone help my figure out why this box (alone out of many, many appliances with similar configurations at various customer sites) is determined to reach out through eth0?
> >
> >The full RPDB ruleset:
> >        # ip rule show
> >        0:   from all lookup 255
> >        10:  from all lookup bootstrap
> >        32736:       from all to 128.119.40.1 lookup mgtroutes
> >        32737:       from all to 192.168.126.228 lookup mgtroutes
> >        32738:       from all to 192.168.76.232 lookup mgtroutes
> >        32739:       from all to 192.168.90.112 lookup mgtroutes
> >        32740:       from all to 192.168.61.112 lookup mgtroutes
> >        32741:       from all to 192.168.76.232 lookup mgtroutes
> >        32742:       from all to 192.168.61.112 lookup mgtroutes
> >        32743:       from all to 192.168.34.35 lookup mgtroutes
> >        32744:       from all to 192.168.61.112 lookup defroutes
> >        32745:       from all to 192.168.134.47 lookup mgtroutes
> >        32746:       from all to 192.168.127.68 lookup mgtroutes
> >        32747:       from all to 192.168.66.6 lookup mgtroutes
> >        32748:       from all to 192.168.126.228 lookup mgtroutes
> >        32749:       from all to 192.168.127.68 lookup mgtroutes
> >        32750:       from all to 192.168.134.41 lookup mgtroutes
> >        32751:       from all to 192.168.134.41 lookup mgtroutes
> >        32752:       from all to 192.168.76.232 lookup mgtroutes
> >        32753:       from all to 107.23.15.175 lookup mgtroutes
> >        32754:       from all to 216.87.69.94 lookup mgtroutes
> >        32755:       from 192.168.72.124 lookup mgtroutes
> >        32756:       from 192.168.72.100 lookup defroutes
> >        32757:       from 192.168.72.0/24 to 192.168.72.0/24 lookup mgtroutes
> >        32758:       from all to 192.168.72.0/24 lookup defroutes
> >        32759:       from 192.168.72.0/24 lookup defroutes
> >        32760:       from 192.168.72.0/24 lookup mgtroutes
> >        32766:       from all lookup main
> >        32767:       from all lookup default
> >
> >
> >
> >     Thank you for any clue you can spare,
> >             Ole
> >--
> >Ole Craig <olc@xxxxxxxxxxxxxxxxxxx>
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe lartc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
> {.n++%ݶw{.n+j\)w*jgݢj
G
j:+vwjmwfh٥


��.n��������+%������w��{.n����j�\�)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥





[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux