Re: GRE-NAT broken

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/27/2018 08:30 PM, walther.xyz wrote:
Hallo Grant,

Hi Matthias,

I'll short this email to keep clarity.

Okay.

Yes. Our wifi access points, which are typically OpenWrt based routers like the TP-LINK TL-WR841N or the TP-LINK TL-WR1043ND connect through L2TP to the Gateways, which then route between our network and the upstream via GRE tunnels. Our gateways are basically a VPN provider system, this is needed because of German law, elsewise the people who share their internet connection could be held responsable for what their guests do in their wifi, if they do evil things like illegal filesharing or worse things.

Okay.

It sounds like you are working under some restrictions that I'm completely ignorant to. Please keep that in mind if I accidentally suggest something that you shouldn't do.

With our network, we fight this stupid law situation and made it quite far already. The worst part of the law called "Störerhaftung", which had held you responsable for anything that was sent over your internet connection, has been abolished during the last legislation period. But people are still scared after they've been told for 15 years that they should never ever share their internet connection with strangers, because they could get cease and desist letters, which can easily cost you 500 or 1000 Euros. So for a while, we'll stick with our VPN network.

Ouch. That sounds serious. - On my side of the pond, cease and desist letters usually are a friendly "stop, or else" type. In fact, most ISPs over here need to send three before they can actually terminate your connection. (RIAA and MPAA are notorious for causing ISPs to send such letters.)

But things are getting better now and some of the federal states started supporting us financially.

:-)

Correct. We use GRE tunnels between the gateways for cross traffic and for upstream. Both tunnels are affected.

ACK

They are rented servers at Hetzner or other cheap hosting companies. We don't have physical access to them. We just rent them and configure them to become VPN endpoints.

ACK

It's not really important, where the servers are located. Most of them are at Hetzner, but not all of them.

The thing that is important, at least for me to understand, is if the GRE tunnels are between the CPE (OpenWRT routers) and your VMs are across the internet. Verses, what I thought when this thread started, your border routers across your internal LAN to your VMs on your own hosts.

The pertinent part is if the GRE is crossing the internet vs your own LAN.

I don't really care where they are hosted (as in which provider), just which side of your internet border router they are on, inside or outside.

Typical situation:

VM (pubilc IP) <---> Hypervisor <---the internet---> Hypervisor <---> VM (public OR private IP)

Please confirm the GRE tunnels are between the VM on the left (with a public IP) and the VM on the right (with a public OR private IP).

The tunnels between public IPs work perfectly. It's just the NAT VMs that cause trouble.

Okay.

We cannot affort to host everything on bare metal servers. We need virtualisation to cut costs.

Sorry if I gave the impression that bare metal was necessary. I think VMs are perfectly fine.

I am somewhat questioning the need for private IPs vs public IPs on the VMs. (I'm still trying to wrap my head around things.)

Sometimes two of seven tunnels work. Sometimes none. Today two work, here a status report from bird, our BGP daemon:

ffrl_fra0 BGP      ffnet    up     02:16:10    Established
ffrl_fra1 BGP      ffnet    start  02:16:10    Connect
ffrl_ber0 BGP      ffnet    start  02:16:10    Connect
ffrl_ber1 BGP      ffnet    start  02:16:10    Connect
ffrl_dus0 BGP      ffnet    up     02:16:22    Established
ffrl_dus1 BGP      ffnet    start  02:16:10    Connect
ibgp_gw02 BGP      ffnet    start  02:16:10    Connect

(Only established tunnels work correctly, "Connect" indicates, that there's something wrong.)

The first six are the upstream connections. The last one is to the partner gateway which exists as failover.

Okay.

So I made you a record, how it is supposed to look:

root@unimatrixzero ~ # tcpdump -ni any host 185.66.195.1 and \( host 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 02:20:16.542279 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64 02:20:16.542282 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64 02:20:16.542286 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64 02:20:16.561304 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64 02:20:16.561313 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64 02:20:16.561315 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64 02:20:17.543573 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64 02:20:17.543587 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64 02:20:17.543605 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64 02:20:17.562563 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64 02:20:17.562585 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64 02:20:17.562590 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64

Well, while running those tests, more tunnels started working:

ffrl_fra0 BGP      ffnet    up     02:16:10    Established
ffrl_fra1 BGP      ffnet    start  02:16:10    Connect
ffrl_ber0 BGP      ffnet    start  02:16:10    Connect
ffrl_ber1 BGP      ffnet    up     02:16:33    Established
ffrl_dus0 BGP      ffnet    up     02:16:22    Established
ffrl_dus1 BGP      ffnet    up     02:16:53    Established
ibgp_gw02 BGP      ffnet    start  02:16:10    Connect

I'll take the last one for tests:

I'll take the last one for tests:

Within the VM:

The packages are send through the tunnel:

root@gw03:~# tcpdump -i bck-gw02 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bck-gw02, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 02:24:15.725060 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 12, length 64 02:24:16.749064 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 13, length 64 02:24:17.749033 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 14, length 64

bck-gw02 ist the GRE-interface.

Now eth0 of the VM:

root@gw03:~# tcpdump -i eth0 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A8 0106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
04:03:43.757089 IP gw03 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5859, length 64 04:03:44.781093 IP gw03 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5860, length 64 04:03:45.805110 IP gw03 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5861, length 64

(Don't look on the seqence number, I had to do some other stuff and let the ping run.)

Now the hypervisor:

vnet0 (the interface of the vm)
root@unimatrixzero ~ # tcpdump -i vnet0 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
04:05:44.496867 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5977, length 64 04:05:45.520863 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5978, length 64 04:05:46.544832 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5979, length 64
^C
root@unimatrixzero ~ # tcpdump -i virbr1 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 bytes 04:06:14.096209 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6006, length 64 04:06:15.120225 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6007, length 64 04:06:16.144186 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6008, length 64

And nothing on eth0 (physical interface):

root@unimatrixzero ~ # tcpdump -i eth0 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
27 packets received by filter
9 packets dropped by kernel

The NAT kernel modul eats the packages  and makes them vanish.

That's odd.

NAT shouldn't eat anything.

NAT should alter IP addresses or not alter them. But NAT itself should not cause traffic to disappear. That sounds like something else. I just don't know what.

See above. Looks correct.

Yes, I did a tcpdump on vnet4 and virbr1 last time. Looked correct. Encapsulated ICMP packages.

Look above. Looks correct.

Agreed.

It was at the time, when I wrote my first email on here. Then things have become even worse because of all my testings. Now things work partly again.

Weird.

Does dmesg give any output that might shed some light on things?

Okay, I'll disable all tunnels except the one I tested above. But I can only disable the „inside part“. There will still be send encapsulated BGP packets from the remote hosts. I can't disable them.

Fair.

The idea is to minimize complicating factors for a few minutes. Do what you can, test, and then restore normal functionality.

Yes, me too. The packages just vanish. I tried to catch them with a iptables log rule inserted after the GRE NAT rule, so that it would catch uncatched packages. But the GRE NAT rule catches them, there were not log entries of uncatched packagse. They just vanish during the NAT process.

Strange.

What if you change your tcpdump filter to just look for GRE (protocol 47) traffic? It will likely match more than what is necessary. But hopefully it will show that packets are being NATed in an unexpected way. In a way that doesn't function on the other end.

Exactly. That is what I intend to do. In this case 176.9.38.150 is the public IP. 192.168.10.62 the private IP.

Okay.

That will work for one VM, but I don't see how it will work for a second VM. Or are you wanting to route all GRE tunnels to one VM and rely on the far end IP to differentiate them?

Yes, that is what I intend to do.

Okay.

The firewall configuration should be correct. We create it an track it with Ansible. This is definitly the iptables configuration that has worked before.

I don't doubt that it functions to allow what you want through. Aside from the GRE NAT issue.

I'm thinking that it might not block traffic that you want filtered. Though I'm assuming that you do want to filter / block some traffic. (All of the firewalls that I've written were to filter all but the desired traffic.)

What are you unsure about? This is a public project, we have nothing to hide Just ask.

The uncertanty is just my perception of how you wrote your firewall vs how I would write a firewall.

I'm not afraid to ask. I was more trying to keep the conversation on the topic of GRE NAT instead of going down a side distraction that is not germane to GRE NAT.

Yes, that works. Just trying to spare an ip address here. And requesting an ip change at our upstream provider takes time. So I once natted the tunnels though to a VM when we virtualized the system for the first time. Originally the gateway ran directly on the host. No within a VM.

Okay.

Requesting public IPs for the VM would solve the problem. But Linux should be capable to nat the tunnels through. And it used to work. At high performance and reliably for several months.

Fair.

Yes.

Understood.

The global IP address of the host is used to access it in the first place. It's just a rented server somewhere on the internet.

*nod*

Yeah, it's fun

:-)

Okay. From my understanding aswell, GRE is quite a simple technology. It just encapsulates the IP packages and decapsulate them at their destination. GRE is even more simple than UDP as it doesn't support ports.

Agreed.

But the destination IP address needs to be replaced. Not more or less than that.

That sounds like standard DNAT to me.

Along with SNAT or MASQUERADE going the other direction.

All those new tests didn't bring any new intel to me.

:-(

So I tested another kernel on the hypervisor:

root@unimatrixzero ~ # uname -a
Linux unimatrixzero 4.4.9-040409-generic #201605041832 SMP Wed May 4 22:34:16 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I think, this was one of the first kernels for Ubuntu 16.04. And this didn't work either. From my understanding, the kernel version is not the problem. There must be another problem.

:-(

Are there any debugging modes for the kernel moduls? I'd like to understand why the packages are dropped. The kernel has nothing to do but to replace the source IP and things should be done.

There's a way to get connection tracking which is used by NATing information out of the kernel.

Look into conntrackd and the conntrack command. You can use the command to get information about what the connection tracking subsystem is doing.

Regards,

Likewise.



--
Grant. . . .
unix || die

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux