Hello Grant, I think I missed an email, as I don't know whom you're quoting here. So after all it's a race condition during start up. It's awesome, that you found the cause! Thanks for all your work, you put into this. Last night, I managed to make all connections work by executing conntrack -D on the hypervisor. Awesome! But this morning, there were some broken tunnels again. This doesn't seem to last very long. You wrote, that I should change the order in startup. So I just should postpone the starting of the VMs for a little while? Or do I need to change the order in my iptables rules somehow? To me this looks like a bug in the conntrack module. It shouldn't be necessary to clean the table manually once in a while. Bye, Matthias Am 31.01.2018 um 06:29 schrieb Grant Taylor: > On 01/30/2018 03:37 PM, Grant Taylor wrote: >> It seems as if something is intercepting the packets. - I doubt >> that it's the NAT module, but I can't rule it out. > > Well, I think I ran into the same problem in my tests. > > Spoiler: I did manage to overcome it. > > I think the Connection Tracking was part of my (initial?) problem. > >> Wait. tcpdump shows that packets are entering one network interface >> but they aren't leaving another network interface? > > I was seeing this behavior too. > >> That sounds like something is filtering the packets. > > I think connection tracking (thus NAT) was (at least) part of the > culprit. > >> I feel like the kicker is that the traffic is never making it out of >> the local system to the far side. As such the far side never gets >> anything, much less replies. > > I don't know if this was the case for my testing or not. I did all of > my testing from the far side in. > >> Ya, the [UNREPLIED] bothers me. As does the fact that you aren't >> seeing the traffic leaving the host's external interface. > > The [UNREPLIED] was the kicker for me. > >> I'd look more into the TRACE option (target) that you seem to have >> enabled in the raw table. That should give you more information >> about the packets flowing through the kernel. > > I ended up not using TRACE. > > I'm not sure why I did a "conntrack -D", but as soon as I did, my long > running ping started working. > > Upon retesting I can confirm that "conntrack -D" was required to make > things work. > > Further testing and using "conntrack -L" showed that there were some > connection tracking states that were in an [UNREPLIED] state. I think > that "conntrack -D" cleared the stale connections and allowed things > to start working. > >> My hunch is that the packets aren't making it out onto the wire for >> some reason. Thus the lack of reply. > > After the testing that I did, I suspect that packets did make it onto > the wire, but were swallowed by connection tracking, thus NAT as you > had originally thought. > >> I'll see if I can't throw together a PoC in Network namespaces this >> evening to evaluate if NATing GRE works. - I'd like to test NATing >> different sets of endpoints (1:1) and NATing multiple remote >> endpoints to one local endpoint (many:1). > > I threw together a Proof of Concept using Network Namespaces, using a > pair of OVSs (bri1 & bri2) and a pair of vEths (between R3 / R2 and R2 > / R1). > > I was able to establish a initially establish a GRE tunnels between H1 > / H3 and H2 / H4. > > After figuring out the the connection tracking problem I was also able > to bring up additional GRE tunnels between H1 / H4 and H2 / H3. > > Take a look at the attached GRE-NAT.sh script. - I do take some > liberties and set up aliases to make things easier. (Read: I'm lazy > and don't want to type any more characters than I have to.) > > alias vsctl='ovs-vsctl' > alias h1='ip netns exec h1' > alias h2='ip netns exec h2' > alias h3='ip netns exec h3' > alias h4='ip netns exec h4' > alias r1='ip netns exec r1' > alias r2='ip netns exec r2' > alias r3='ip netns exec r3' > > The network between: > > H1 / H2 / R3 is Test-Net-1, 192.0.2.0/24 > R3 / R2 is Test-Net-2, 198.51.100.0/24 > R2 / R1 is Test-Net-3, 203.0.113.0/24 > R1 / H3 / H4 is RFC 1918 private, 192.168.0.0/24 > > I addressed the GRE tunnels as RFC 1918 private, 10.<Left #>.<Right > #>.<Device #/24. > > R3 & R1 are numbered the way that they are so that their device # > doesn't conflict with something local. > > I did manage to get the PoC to work without needing to issue the > "conntrack -D" command by simply moving the NAT rules earlier in the > script before I tried to establish the tunnels. > > I can only surmise that there was some sort of bad state that > connection tracking learned that couldn't fix itself. - This was > sort of random and unpredictable, much like what you're saying. - It > also likely has to do with what end talks first. > > I found that I could get things to start working if I issued the > following command: > > (ip netns exec) r1 conntrack -D > > Ultimately I was able to issue the following commands: > > h1 ping -c 4 10.1.3.3 > h1 ping -c 4 10.1.4.4 > > h2 ping -c 4 10.2.3.3 > h2 ping -c 4 10.2.4.4 > > h3 ping -c 4 10.1.3.1 > h3 ping -c 4 10.2.3.2 > > h4 ping -c 4 10.1.4.1 > h4 ping -c 4 10.2.4.2 > > I /think/ that this is what you were wanting to do. And, I think you > were correct all along in that NAT ~> connection tracking was in fact > messing with you. > > Anyway, have fun with the PoC. Ask if you have any questions about > what / why / how I did something. > > Oh, ya, I did have the following GRE related modules loaded: > > # lsmod | grep -i gre > nf_conntrack_proto_gre 16384 0 > nf_nat_proto_gre 16384 0 > ip_gre 24576 0 > ip_tunnel 28672 1 ip_gre > gre 16384 1 ip_gre > > I'm running kernel 4.9.76-gentoo-r1. > >> You might be onto something about the first packet. At least as far >> as what connection tracking sees. > > I think the kicker has to do with connection tracking learning state > on the first packet. > > > -- To unsubscribe from this list: send the line "unsubscribe lartc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html