[LARTC] need help with policy routing.

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



so, i have a pretty complex (for me, that is) setup on this one machine
that acts as a nameserver and mail server and some other stuff and answers
to a handful of ips.  it's also a "real server" behind an lvs director.
the machine in question is running a modified redhat 6.2 with a 2.2.17ext3
kernel (stock 2.2.17 + ext3 patches + nfs patches).

let me try to describe this as best i can.

our external network is 64.211.224.160/28.  161 is the router/gateway to
the rest of the world.  162 is an auth nameserver.  163 is an auth
nameserver.  164 is the ip used for outgoing connections from behind
masquerading.  165 is for web traffic.  166 is for incoming mail.  and i
just put 169 in as a standalone machine.

the 164 masquerading server allows the nameserver/mailserver to send
requests to the outside world:
MASQ       all  ------  192.168.1.21         0.0.0.0/0             n/a

the lvs director basically handles all incoming traffic and forwards it to
the right place:
IP Virtual Server version 1.0.0-beta1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  64.211.224.165:443 lc persistent 360
  -> 192.168.1.101:443           Route   1      0          0         
  -> 192.168.1.102:443           Route   1      0          0         
UDP  64.211.224.162:53 lc
  -> 192.168.1.11:53             Route   1      0          349       
UDP  64.211.224.163:53 lc
  -> 192.168.1.12:53             Route   1      0          183       
TCP  64.211.224.163:53 lc
  -> 192.168.1.12:53             Route   1      0          0         
TCP  64.211.224.162:53 lc
  -> 192.168.1.11:53             Route   1      0          0         
TCP  64.211.224.166:22 lc
  -> 192.168.1.21:22             Route   1      0          0         
TCP  64.211.224.168:22 lc
  -> 192.168.1.21:22             Route   1      16         0         
TCP  64.211.224.166:25 lc
  -> 192.168.1.21:25             Route   1      0          0         
TCP  64.211.224.165:80 lc
  -> 192.168.1.101:80            Route   1      0          3         
  -> 192.168.1.102:80            Route   1      0          1         

then there's the "phl" machine which handles dns and mail:
[root@xxx /root]# /sbin/ifconfig 
eth0      Link encap:Ethernet  HWaddr 00:D0:B7:65:EC:48  
          inet addr:192.168.1.21  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24535885 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24655159 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          Interrupt:11 Base address:0x2800 

eth0:0    Link encap:Ethernet  HWaddr 00:D0:B7:65:EC:48  
          inet addr:192.168.1.11  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x2800 

eth0:1    Link encap:Ethernet  HWaddr 00:D0:B7:65:EC:48  
          inet addr:192.168.1.12  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x2800 

eth0:2    Link encap:Ethernet  HWaddr 00:D0:B7:65:EC:48  
          inet addr:192.168.1.13  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x2800 

eth0:3    Link encap:Ethernet  HWaddr 00:D0:B7:65:EC:48  
          inet addr:192.168.1.14  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x2800 

eth0:4    Link encap:Ethernet  HWaddr 00:D0:B7:65:EC:48  
          inet addr:192.168.1.10  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x2800 

eth1      Link encap:Ethernet  HWaddr 00:C0:95:E2:85:40  
          inet addr:192.168.2.21  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20102464 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19892838 errors:6 dropped:0 overruns:3 carrier:6
          collisions:0 txqueuelen:100 
          Interrupt:11 Base address:0x3000 

eth1:0    Link encap:Ethernet  HWaddr 00:C0:95:E2:85:40  
          inet addr:192.168.2.13  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x3000 

eth1:1    Link encap:Ethernet  HWaddr 00:C0:95:E2:85:40  
          inet addr:192.168.2.14  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x3000 

eth1:2    Link encap:Ethernet  HWaddr 00:C0:95:E2:85:40  
          inet addr:192.168.2.10  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x3000 

eth2      Link encap:Ethernet  HWaddr 00:C0:95:E2:85:41  
          inet addr:192.168.3.21  Bcast:192.168.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:74336 errors:0 dropped:0 overruns:0 frame:0
          TX packets:111705 errors:16 dropped:0 overruns:2 carrier:28
          collisions:0 txqueuelen:100 
          Interrupt:10 Base address:0x3080 

eth2:0    Link encap:Ethernet  HWaddr 00:C0:95:E2:85:41  
          inet addr:192.168.3.13  Bcast:192.168.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0x3080 

eth2:1    Link encap:Ethernet  HWaddr 00:C0:95:E2:85:41  
          inet addr:192.168.3.14  Bcast:192.168.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0x3080 

eth2:2    Link encap:Ethernet  HWaddr 00:C0:95:E2:85:41  
          inet addr:192.168.3.10  Bcast:192.168.3.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0x3080 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:191349 errors:0 dropped:0 overruns:0 frame:0
          TX packets:191349 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 

lo:0      Link encap:Local Loopback  
          inet addr:64.211.224.162  Mask:255.255.255.240
          UP LOOPBACK RUNNING  MTU:3924  Metric:1

lo:1      Link encap:Local Loopback  
          inet addr:64.211.224.163  Mask:255.255.255.240
          UP LOOPBACK RUNNING  MTU:3924  Metric:1

lo:2      Link encap:Local Loopback  
          inet addr:64.211.224.166  Mask:255.255.255.240
          UP LOOPBACK RUNNING  MTU:3924  Metric:1

lo:3      Link encap:Local Loopback  
          inet addr:64.211.224.168  Mask:255.255.255.240
          UP LOOPBACK RUNNING  MTU:3924  Metric:1

[root@xxx /root]# /sbin/route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface
64.211.224.166  0.0.0.0         255.255.255.255 UH    0      0        0 lo
192.168.2.10    0.0.0.0         255.255.255.255 UH    0      0        0
eth1
192.168.2.13    0.0.0.0         255.255.255.255 UH    0      0        0
eth1
192.168.1.21    0.0.0.0         255.255.255.255 UH    0      0        0
eth0
192.168.3.21    0.0.0.0         255.255.255.255 UH    0      0        0
eth2
64.211.224.162  0.0.0.0         255.255.255.255 UH    0      0        0 lo
64.211.224.163  0.0.0.0         255.255.255.255 UH    0      0        0 lo
192.168.2.14    0.0.0.0         255.255.255.255 UH    0      0        0
eth1
192.168.1.11    0.0.0.0         255.255.255.255 UH    0      0        0
eth0
192.168.1.10    0.0.0.0         255.255.255.255 UH    0      0        0
eth0
192.168.3.10    0.0.0.0         255.255.255.255 UH    0      0        0
eth2
192.168.1.13    0.0.0.0         255.255.255.255 UH    0      0        0
eth0
192.168.3.13    0.0.0.0         255.255.255.255 UH    0      0        0
eth2
192.168.2.21    0.0.0.0         255.255.255.255 UH    0      0        0
eth1
192.168.1.12    0.0.0.0         255.255.255.255 UH    0      0        0
eth0
64.211.224.168  0.0.0.0         255.255.255.255 UH    0      0        0 lo
192.168.1.14    0.0.0.0         255.255.255.255 UH    0      0        0
eth0
192.168.3.14    0.0.0.0         255.255.255.255 UH    0      0        0
eth2
64.211.224.160  0.0.0.0         255.255.255.240 U     0      0        0
eth0
192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0
eth2
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0
eth1
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0
eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo

[root@xxx /root]# cat /etc/sysctl.conf 
# Disables packet forwarding
net.ipv4.ip_forward = 1
# Enables source route verification
net.ipv4.conf.all.rp_filter = 1
# Disables automatic defragmentation (needed for masquerading, LVS)
net.ipv4.ip_always_defrag = 0
# Disables the magic-sysrq key
kernel.sysrq = 1

# -tcl.
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0
net.ipv4.conf.all.hidden = 1
net.ipv4.conf.lo.hidden = 1
#
[root@xxx /root]# tail --lines 30 /etc/rc.d/rc.local 

#
# -tcl.
#
# the whole static-routes / network scripts / lo:# / gateway being on a
# different device than ips on the same network / bl ah blah lah sajdhsd.
# totally flaky.  let's just do it all here.
#
/sbin/sysctl -p
/sbin/route add -net 64.211.224.160 netmask 255.255.255.240 dev eth0
#/sbin/route add default gw 64.211.224.161 dev eth0
##/sbin/arp -s 64.211.224.161 00:30:B6:67:00:40
/sbin/arp -s 64.211.224.161 00:30:B6:67:00:AA
#/sbin/ip rule add prio 100 from 192.168.1.0/24 table 100
#/sbin/ip route add table 100 0/0 via 192.168.1.1 dev eth0
/sbin/ifconfig lo:0 64.211.224.162 netmask 255.255.255.240 broadcast
64.211.224.175 up
/sbin/route add -host 64.211.224.162 dev lo:0
/sbin/ifconfig lo:1 64.211.224.163 netmask 255.255.255.240 broadcast
64.211.224.175 up
/sbin/route add -host 64.211.224.163 dev lo:1
/sbin/ifconfig lo:2 64.211.224.166 netmask 255.255.255.240 broadcast
64.211.224.175 up
/sbin/route add -host 64.211.224.166 dev lo:2
/sbin/ifconfig lo:3 64.211.224.168 netmask 255.255.255.240 broadcast
64.211.224.175 up
/sbin/route add -host 64.211.224.168 dev lo:3
#/sbin/ip rule add prio 33000 from 192.168.1.0/24 table 100
/sbin/ip route add table 100 0/0 via 192.168.1.1 dev eth0
#/sbin/ip rule add prio 34000 from 0/0 table 200
/sbin/ip route add table 200 0/0 via 64.211.224.161 dev eth0
/sbin/ip rule add prio 33000 from 64.211.224.160/28 table 200
/sbin/ip rule add prio 34000 from 0/0 table 100
#

[root@xxx /root]# ip rule
0:      from all lookup local 
32766:  from all lookup main 
32767:  from all lookup 253 
33000:  from 64.211.224.160/28 lookup 200 
34000:  from all lookup 100 
[root@xxx /root]# ip route
64.211.224.166 dev lo  scope link  src 64.211.224.166 
192.168.2.10 dev eth1  scope link  src 192.168.2.10 
192.168.2.13 dev eth1  scope link  src 192.168.2.13 
192.168.1.21 dev eth0  scope link 
192.168.3.21 dev eth2  scope link 
64.211.224.162 dev lo  scope link  src 64.211.224.162 
64.211.224.163 dev lo  scope link  src 64.211.224.163 
192.168.2.14 dev eth1  scope link  src 192.168.2.14 
192.168.1.11 dev eth0  scope link  src 192.168.1.11 
192.168.1.10 dev eth0  scope link  src 192.168.1.10 
192.168.3.10 dev eth2  scope link  src 192.168.3.10 
192.168.1.13 dev eth0  scope link  src 192.168.1.13 
192.168.3.13 dev eth2  scope link  src 192.168.3.13 
192.168.2.21 dev eth1  scope link 
192.168.1.12 dev eth0  scope link  src 192.168.1.12 
64.211.224.168 dev lo  scope link  src 64.211.224.168 
192.168.1.14 dev eth0  scope link  src 192.168.1.14 
192.168.3.14 dev eth2  scope link  src 192.168.3.14 
64.211.224.160/28 dev eth0  scope link 
192.168.3.0/24 dev eth2  proto kernel  scope link  src 192.168.3.21 
192.168.2.0/24 dev eth1  proto kernel  scope link  src 192.168.2.21 
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.21 
127.0.0.0/8 dev lo  scope link 
[root@xxx /root]# ip route list table 100
default via 192.168.1.1 dev eth0 
[root@xxx /root]# ip route list table 200
default via 64.211.224.161 dev eth0 
[root@xxx /root]# 


the end result of this is that, well, for example, a nameservice query get
directed through the lvs director to the phl real server, which answers it
via direct routing.  phl can also get to the outside world to deliver mail
/ make dns queries of its own via the masquerading.  the policy routing
says that traffic with a source ip of 64.211.224.160/28 gets sent via
64.211.224.161 (direct routing instead of nat/masq), whereas traffic with
a source ip of anything else should go through 192.168.1.1 and be
masqueraded.  those 192.168.2 and .3 and whatever other networks on there
can be ignored.

/me breathes.

ok.  so all that has been working perfectly for months.  the problem is
that now i added a machine on 64.211.224.169 to do mail serving and stuff
for our employees and some other stuff.  for example, mail to
@mybiz-inc.com gets delivered to 64.211.224.169, while mail to @mybiz.com
gets directed to 64.211.224.166 (through the lvs director and to phl).
the problem is that phl can't send traffic to 64.211.224.169 -- phl seems 
to think that 64.211.224.169 is on its loopback interface.  64.211.224.169
tries to make nameservice queries for 169.160-175.224.211.64.in-addr.arpa
and *.mybiz.com to 64.211.224.162 and 64.211.224.163 (the auth nameservers
for that -- phl handles them), but phl never responds.  phl also tries to
deliver mail to 64.211.224.169, but it can't send traffic there.

check out:
[root@xxx /root]# tcpdump -n host 64.211.224.169 and not port 53 &
[1] 20668
User level filter, protocol ALL, datagram packet socket
tcpdump: listening on all devices
[root@xxx /root]# ping -n -c 5 64.211.224.169
PING 64.211.224.169 (64.211.224.169) from 64.211.224.169 : 56(84) bytes of
data.
14:04:36.653475   lo > 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:36.653475   lo < 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:36.653506   lo > 64.211.224.169 > 64.211.224.169: icmp: echo reply
14:04:36.653506   lo < 64.211.224.169 > 64.211.224.169: icmp: echo reply
64 bytes from 64.211.224.169: icmp_seq=0 ttl=255 time=63 usec
14:04:37.649412   lo > 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:37.649412   lo < 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:37.649430   lo > 64.211.224.169 > 64.211.224.169: icmp: echo reply
14:04:37.649430   lo < 64.211.224.169 > 64.211.224.169: icmp: echo reply
64 bytes from 64.211.224.169: icmp_seq=1 ttl=255 time=34 usec
14:04:38.649446   lo > 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:38.649446   lo < 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:38.649462   lo > 64.211.224.169 > 64.211.224.169: icmp: echo reply
14:04:38.649462   lo < 64.211.224.169 > 64.211.224.169: icmp: echo reply
64 bytes from 64.211.224.169: icmp_seq=2 ttl=255 time=28 usec
14:04:39.649495   lo > 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:39.649495   lo < 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:39.649516   lo > 64.211.224.169 > 64.211.224.169: icmp: echo reply
14:04:39.649516   lo < 64.211.224.169 > 64.211.224.169: icmp: echo reply
64 bytes from 64.211.224.169: icmp_seq=3 ttl=255 time=37 usec
14:04:40.649527   lo > 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:40.649527   lo < 64.211.224.169 > 64.211.224.169: icmp: echo request
14:04:40.649545   lo > 64.211.224.169 > 64.211.224.169: icmp: echo reply
14:04:40.649545   lo < 64.211.224.169 > 64.211.224.169: icmp: echo reply
64 bytes from 64.211.224.169: icmp_seq=4 ttl=255 time=31 usec

--- 64.211.224.169 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/mdev = 0.028/0.038/0.063/0.014 ms
[root@xxx /root]# fg
tcpdump -n host 64.211.224.169 and not port 53

158 packets received by filter
[root@xxx /root]# 


when 169 tries to telnet to 166 port 25 (which gets directed to phl):
[root@xxx /root]# tcpdump -n host 64.211.224.169 and not port 53
User level filter, protocol ALL, datagram packet socket
tcpdump: listening on all devices
14:05:20.460200 eth0 B arp who-has 64.211.224.169 tell 64.211.224.162
14:05:50.883915 eth0 B arp who-has 64.211.224.166 tell 64.211.224.169
14:05:50.884155 eth0 < 64.211.224.169.1058 > 64.211.224.166.smtp: S
4151665104:4151665104(0) win 32120 <mss 1460,sackOK,timestamp 25658644
0,nop,wscale 0> (DF)
14:05:53.879424 eth0 < 64.211.224.169.1058 > 64.211.224.166.smtp: S
4151665104:4151665104(0) win 32120 <mss 1460,sackOK,timestamp 25658944
0,nop,wscale 0> (DF)

725 packets received by filter

no response is ever sent.


when phl tries to send mail to mybiz-inc.com:
[root@xxx /root]# dnsmx mybiz-inc.com
0 mail.mybiz-inc.com
[root@xxx /root]# dnsip mail.mybiz-inc.com
64.211.224.169 
[root@xxx /root]# telnet 64.211.224.169 25
Trying 64.211.224.169...
Connected to inc.mybiz.com (64.211.224.169).
Escape character is '^]'.
220 phl.usa.mybiz ESMTP
^]q

Connection closed.
[root@xxx /root]# 

14:07:39.001323   lo > 64.211.224.169.1549 > 64.211.224.169.smtp: S
4291120419:4291120419(0) win 31072 <mss 3884,sackOK,timestamp 441773751
0,nop,wscale 0> (DF)
14:07:39.001323   lo < 64.211.224.169.1549 > 64.211.224.169.smtp: S
4291120419:4291120419(0) win 31072 <mss 3884,sackOK,timestamp 441773751
0,nop,wscale 0> (DF)
14:07:39.001367   lo > 64.211.224.169.smtp > 64.211.224.169.1549: S
200723:200723(0) ack 4291120420 win 31072 <mss 3884,sackOK,timestamp
441773751 441773751,nop,wscale 0> (DF)
14:07:39.001367   lo < 64.211.224.169.smtp > 64.211.224.169.1549: S
200723:200723(0) ack 4291120420 win 31072 <mss 3884,sackOK,timestamp
441773751 441773751,nop,wscale 0> (DF)
14:07:39.001390   lo > 64.211.224.169.1549 > 64.211.224.169.smtp: . 1:1(0)
ack 1 win 31072 <nop,nop,timestamp 441773751 441773751> (DF)
14:07:39.001390   lo < 64.211.224.169.1549 > 64.211.224.169.smtp: . 1:1(0)
ack 1 win 31072 <nop,nop,timestamp 441773751 441773751> (DF)
14:07:39.007531   lo > 64.211.224.169.smtp > 64.211.224.169.1549: P
1:26(25) ack 1 win 31072 <nop,nop,timestamp 441773752 441773751> (DF)
14:07:39.007531   lo < 64.211.224.169.smtp > 64.211.224.169.1549: P
1:26(25) ack 1 win 31072 <nop,nop,timestamp 441773752 441773751> (DF)
14:07:39.007570   lo > 64.211.224.169.1549 > 64.211.224.169.smtp: . 1:1(0)
ack 26 win 31047 <nop,nop,timestamp 441773752 441773752> (DF)
14:07:39.007570   lo < 64.211.224.169.1549 > 64.211.224.169.smtp: . 1:1(0)
ack 26 win 31047 <nop,nop,timestamp 441773752 441773752> (DF)
Connected to inc.mybiz.com (64.211.224.169).


it tries to send to itself.

does anyone have any idea why phl would think 64.211.224.169 is on its lo?
it seems to think that for all of 64.211.224.160/28.  if i telnet to port
25 on any ip in that range, phl directs the request to itself on lo just
like 169.

anyone even understand this?  heh.  i'm seriously confused myself.

i'd love to hear any ideas.

-tcl.




[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux