Re: GRE-NAT broken - SOLVED

Matthias Walther <matthias@xxxxxxxxxxx> · Thu, 1 Feb 2018 11:34:10 +0100

Hello Grant,

I think I missed an email, as I don't know whom you're quoting here.

So after all it's a race condition during start up. It's awesome, that
you found the cause! Thanks for all your work, you put into this.

Last night, I managed to make all connections work by executing
conntrack -D on the hypervisor. Awesome!

But this morning, there were some broken tunnels again. This doesn't
seem to last very long.

You wrote, that I should change the order in startup. So I just should
postpone the starting of the VMs for a little while? Or do I need to
change the order in my iptables rules somehow?

To me this looks like a bug in the conntrack module. It shouldn't be
necessary to clean the table manually once in a while.

Bye,
Matthias

Am 31.01.2018 um 06:29 schrieb Grant Taylor:
> On 01/30/2018 03:37 PM, Grant Taylor wrote:
>> It seems as if something is intercepting the packets.  -  I doubt
>> that it's the NAT module, but I can't rule it out.
>
> Well, I think I ran into the same problem in my tests.
>
> Spoiler:  I did manage to overcome it.
>
> I think the Connection Tracking was part of my (initial?) problem.
>
>> Wait.  tcpdump shows that packets are entering one network interface
>> but they aren't leaving another network interface?
>
> I was seeing this behavior too.
>
>> That sounds like something is filtering the packets.
>
> I think connection tracking (thus NAT) was (at least) part of the
> culprit.
>
>> I feel like the kicker is that the traffic is never making it out of
>> the local system to the far side.  As such the far side never gets
>> anything, much less replies.
>
> I don't know if this was the case for my testing or not.  I did all of
> my testing from the far side in.
>
>> Ya, the [UNREPLIED] bothers me.  As does the fact that you aren't
>> seeing the traffic leaving the host's external interface.
>
> The [UNREPLIED] was the kicker for me.
>
>> I'd look more into the TRACE option (target) that you seem to have
>> enabled in the raw table.  That should give you more information
>> about the packets flowing through the kernel.
>
> I ended up not using TRACE.
>
> I'm not sure why I did a "conntrack -D", but as soon as I did, my long
> running ping started working.
>
> Upon retesting I can confirm that "conntrack -D" was required to make
> things work.
>
> Further testing and using "conntrack -L" showed that there were some
> connection tracking states that were in an [UNREPLIED] state.  I think
> that "conntrack -D" cleared the stale connections and allowed things
> to start working.
>
>> My hunch is that the packets aren't making it out onto the wire for
>> some reason.  Thus the lack of reply.
>
> After the testing that I did, I suspect that packets did make it onto
> the wire, but were swallowed by connection tracking, thus NAT as you
> had originally thought.
>
>> I'll see if I can't throw together a PoC in Network namespaces this
>> evening to evaluate if NATing GRE works.  -  I'd like to test NATing
>> different sets of endpoints (1:1) and NATing multiple remote
>> endpoints to one local endpoint (many:1).
>
> I threw together a Proof of Concept using Network Namespaces, using a
> pair of OVSs (bri1 & bri2) and a pair of vEths (between R3 / R2 and R2
> / R1).
>
> I was able to establish a initially establish a GRE tunnels between H1
> / H3 and H2 / H4.
>
> After figuring out the the connection tracking problem I was also able
> to bring up additional GRE tunnels between H1 / H4 and H2 / H3.
>
> Take a look at the attached GRE-NAT.sh script.  -  I do take some
> liberties and set up aliases to make things easier.  (Read: I'm lazy
> and don't want to type any more characters than I have to.)
>
> alias vsctl='ovs-vsctl'
> alias h1='ip netns exec h1'
> alias h2='ip netns exec h2'
> alias h3='ip netns exec h3'
> alias h4='ip netns exec h4'
> alias r1='ip netns exec r1'
> alias r2='ip netns exec r2'
> alias r3='ip netns exec r3'
>
> The network between:
>
> H1 / H2 / R3 is Test-Net-1, 192.0.2.0/24
> R3 / R2 is Test-Net-2, 198.51.100.0/24
> R2 / R1 is Test-Net-3, 203.0.113.0/24
> R1 / H3 / H4 is RFC 1918 private, 192.168.0.0/24
>
> I addressed the GRE tunnels as RFC 1918 private, 10.<Left #>.<Right
> #>.<Device #/24.
>
> R3 & R1 are numbered the way that they are so that their device #
> doesn't conflict with something local.
>
> I did manage to get the PoC to work without needing to issue the
> "conntrack -D" command by simply moving the NAT rules earlier in the
> script before I tried to establish the tunnels.
>
> I can only surmise that there was some sort of bad state that
> connection tracking learned that couldn't fix itself.  -  This was
> sort of random and unpredictable, much like what you're saying.  -  It
> also likely has to do with what end talks first.
>
> I found that I could get things to start working if I issued the
> following command:
>
> (ip netns exec) r1 conntrack -D
>
> Ultimately I was able to issue the following commands:
>
> h1 ping -c 4 10.1.3.3
> h1 ping -c 4 10.1.4.4
>
> h2 ping -c 4 10.2.3.3
> h2 ping -c 4 10.2.4.4
>
> h3 ping -c 4 10.1.3.1
> h3 ping -c 4 10.2.3.2
>
> h4 ping -c 4 10.1.4.1
> h4 ping -c 4 10.2.4.2
>
> I /think/ that this is what you were wanting to do.  And, I think you
> were correct all along in that NAT ~> connection tracking was in fact
> messing with you.
>
> Anyway, have fun with the PoC.  Ask if you have any questions about
> what / why / how I did something.
>
> Oh, ya, I did have the following GRE related modules loaded:
>
> # lsmod | grep -i gre
> nf_conntrack_proto_gre    16384  0
> nf_nat_proto_gre          16384  0
> ip_gre                    24576  0
> ip_tunnel                 28672  1 ip_gre
> gre                       16384  1 ip_gre
>
> I'm running kernel 4.9.76-gentoo-r1.
>
>> You might be onto something about the first packet.  At least as far
>> as what connection tracking sees.
>
> I think the kicker has to do with connection tracking learning state
> on the first packet.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: GRE-NAT broken - SOLVED

Linux Advanced Routing and Traffic Control