I am trying to use IPIP tunnels and multipath access to Internet inside a LAN. Before actually deploying I am testing in a virtual network environment, netkit. I am experiencing a strange behavior, I do not seem to figure out the cause/solution, please help. The testbed simulates these 4 segments (if you use a monospaced font it should appear right): I ------------------- | | | eth1 eth0 eth1 | | | AP1 NI AP2 A ------------------- | | | eth0 eth0 eth0 | | | AP1 N1 N2 B ------------ | | eth1 eth1 | | N2 N3 C -------------------------- | | | | eth0 eth0 eth0 eth0 | | | | N3 N4 N5 AP2 The segment I represents the Internet. NI is a destination host in Internet realm. AP1 and AP2 represent 2 ap/routers. They do NAT to their clients. The LAN includes the other segments, A B and C. In segment A, AP1 is a dhcp server and only N1 is a client. In segment C, AP2 is a dhcp server and N4 and N5 are 2 clients. N2 and N3 have 2 nics; with eth1 they are attached to segment B; they do routing for the LAN. AP1 and AP2 (and obviously NI) have a public address at eth1. Further they both have address 192.168.1.1 at eth0 and a route to 192.168.1.0/24. N1, N2, N3, N4, N5 have an address in subnet 10.1.1.0/24. N1 has an address in 192.168.1.0/24 and a default route via AP1; further it enables a IPIP tunnel interface. N4 idem. N3 doesn't have an address in 192.168.1.0/24; it has a IPIP tunnel to N1; it has a IPIP tunnel to N4; its default route is a multipath via N1 or via N4. N5, instead, has an address in 192.168.1.0/24; it also has a IPIP tunnel to N1; it has a IPIP tunnel to N4; its default route is a multipath via 192.168.1.1 (AP2), or via N1, or via N4. A ping from N3 to NI works flawlessly. By watching the TTL I can see that the packets pass at times via N1, at times via N4. A ping from N5 to NI works flawlessly. By watching the TTL I can see that the packets pass at times via N1, at times via N4, at times immediately via AP2. The problem is with TCP connections. I wrote a simple client/server app to test it. The server listens, the client connects to the server and then the two continuously exchange an HELO packet each second. The server runs in NI. The client in N3 works flawlessly. I can launch simultaneously tens of them and let them go for tens of minutes, no problem. The client in N5 works fine every now and then, but some time it cannot make the connection. I add that the problem should not be searched in a TCP connection being reset by the server when it sees a different src IP. In fact the correct commands (iptables with CONNMARK, ip rule, etc etc) have been included. Otherwise, also N3 should experience problems. The only difference between N3 (ok) and N5 (not good) is the default route. N3: ip route add default \ nexthop via 10.1.1.1 dev ntk-to-inet-0 weight 100 onlink \ nexthop via 10.1.1.4 dev ntk-to-inet-1 weight 100 onlink N5: ip route add default \ nexthop via 192.168.1.1 dev eth0 weight 100 \ nexthop via 10.1.1.1 dev ntk-to-inet-0 weight 70 onlink \ nexthop via 10.1.1.4 dev ntk-to-inet-1 weight 30 onlink Perhaps the difference is that N5, when a connection is made via 10.1.1.1 or via 10.1.1.4, has to specify its address 10.1.1.5. Instead, when a connection is made via 192.168.1.1, it has to specify its address 192.168.1.40. On the contrary, N3 will always use its address 10.1.1.3. But, in the "nexthop" part you cannot specify a "src" option. Anyway, the correct "src" option is already specified in the routes to reach 10.1.1.1 and 192.168.1.1, so I think the kernel should know what to do. Can anyone make a guess of what the real problem / solution could be? I can send the testbed, it should be easy to reproduce if you have a working netkit environment already installed. Regards Luca -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html