Re: GRE-NAT broken - SOLVED

Grant Taylor <gtaylor@xxxxxxxxxxxxxxxxxx> · Thu, 1 Feb 2018 11:31:49 -0700

On 02/01/2018 03:34 AM, Matthias Walther wrote:
Hello Grant,

Hi Matthias,

I think I missed an email, as I don't know whom you're quoting here.

I was actually quoting myself in an email I sent about 7 hours prior. 
Here's a link to it:  https://www.spinics.net/lists/lartc/msg23508.html

So after all it's a race condition during start up. It's awesome, that 
you found the cause! Thanks for all your work, you put into this.

You're welcome.

Last night, I managed to make all connections work by executing conntrack 
-D on the hypervisor. Awesome!

Yay!

But this morning, there were some broken tunnels again. This doesn't 
seem to last very long.

Hum.  :-/

You wrote, that I should change the order in startup. So I just should 
postpone the starting of the VMs for a little while? Or do I need to 
change the order in my iptables rules somehow?

I think it's an issue between when the IPTables rules are entered vs 
when the GRE tunnels are brought up.

You might not have the ability to control when GRE packets come in from 
the remote sites.  Thus connection tracking may learn about something 
before IPTables is ready.

I think that you will need to do some more digging into connection 
tracking and how to interpret the output.  At least enough so that you 
can learn what is necessary to surgically add / remove entries to the 
connection tracking table.  That way you won't need to blow the entire 
connection tracking table away like "conntrack -D" does.

To me this looks like a bug in the conntrack module. It shouldn't be 
necessary to clean the table manually once in a while.

I don't know if it's a bug in connection tracking or not.  It might 
simply be a race condition.  I.e. depending on which direction CT sees 
GRE packets from first, and possibly associated replies.  Possibly 
leading to an undesired state ala race condition.

Note:  CT state expiration can also likely cause the "seen first" issue 
again, even after the systems have been up and the tunnels have passed 
traffic.

Try clearing the connection tracking table, and then starting a 
persistent ping through each tunnel and seeing if the tunnels stay up 
and functional.  -  I.e. constantly send traffic through the tunnels to 
make sure that the connection tracking table entries don't become stale, 
which leads to them getting purged, which means a new "first seen" 
condition again.

If the persistent ping does work, 1) you have a workaround, and 2) you 
know that it's likely CT state expiration, which means that there may be 
a tunable that can help prevent the relevant state information from 
expiring.

--
Grant. . . .
unix || die
--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: GRE-NAT broken - SOLVED

Linux Advanced Routing and Traffic Control