Re: GRE-NAT broken

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hallo Grant,

I'll short this email to keep clarity.

Am 25.01.2018 um 23:57 schrieb Grant Taylor:
> So are the VMs functioning as routers for clients behind them?
Yes. Our wifi access points, which are typically OpenWrt based routers
like the TP-LINK TL-WR841N or the TP-LINK TL-WR1043ND connect through
L2TP to the Gateways, which then route between our network and the
upstream via GRE tunnels. Our gateways are basically a VPN provider
system, this is needed because of German law, elsewise the people who
share their internet connection could be held responsable for what their
guests do in their wifi, if they do evil things like illegal filesharing
or worse things.

With our network, we fight this stupid law situation and made it quite
far already. The worst part of the law called "Störerhaftung", which had
held you responsable for anything that was sent over your internet
connection, has been abolished during the last legislation period. But
people are still scared after they've been told for 15 years that they
should never ever share their internet connection with strangers,
because they could get cease and desist letters, which can easily cost
you 500 or 1000 Euros. So for a while, we'll stick with our VPN network.
:) But things are getting better now and some of the federal states
started supporting us financially.
>
> It sounds like the GRE tunnel is functionally used to connect the VM
> with your border routers, correct?
Correct. We use GRE tunnels between the gateways for cross traffic and
for upstream. Both tunnels are affected.
> It now occurs to me to ask, are these VMs hosted within your network
> or outside in the cloud?
They are rented servers at Hetzner or other cheap hosting companies. We
don't have physical access to them. We just rent them and configure them
to become VPN endpoints.
>
> I'm now getting the impression that the GRE tunnel might be from your
> border router, across the Internet, and into VMs in the cloud.
It's not really important, where the servers are located. Most of them
are at Hetzner, but not all of them.

Typical situation:

VM(pubilc IP) <---> Hypervisor <---the internet---> Hypervisor <---> VM
(public OR private IP)

The tunnels between public IPs work perfectly. It's just the NAT VMs
that cause trouble.

We cannot affort to host everything on bare metal servers. We need
virtualisation to cut costs.

> Okay.  That just means that it's not currently possible to revert to
> something that works as a diagnostic aid.  So the only way out is
> forward through the problem.
Sometimes two of seven tunnels work. Sometimes none. Today two work,
here a status report from bird, our BGP daemon:
ffrl_fra0 BGP      ffnet    up     02:16:10    Established  
ffrl_fra1 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber0 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber1 BGP      ffnet    start  02:16:10    Connect      
ffrl_dus0 BGP      ffnet    up     02:16:22    Established  
ffrl_dus1 BGP      ffnet    start  02:16:10    Connect      
ibgp_gw02 BGP      ffnet    start  02:16:10    Connect

(Only established tunnels work correctly, "Connect" indicates, that
there's something wrong.)

The first six are the upstream connections. The last one is to the
partner gateway which exists as failover.

So I made you a record, how it is supposed to look:

root@unimatrixzero ~ # tcpdump -ni any host 185.66.195.1 and \( host
176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and
\( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
02:20:16.542279 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64
02:20:16.542282 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64
02:20:16.542286 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64
02:20:16.561304 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
02:20:16.561313 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
02:20:16.561315 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64
02:20:17.543573 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64
02:20:17.543587 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64
02:20:17.543605 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP
185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64
02:20:17.562563 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
02:20:17.562585 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64
02:20:17.562590 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP
100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64

Well, while running those tests, more tunnels started working:
ffrl_fra0 BGP      ffnet    up     02:16:10    Established  
ffrl_fra1 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber0 BGP      ffnet    start  02:16:10    Connect      
ffrl_ber1 BGP      ffnet    up     02:16:33    Established  
ffrl_dus0 BGP      ffnet    up     02:16:22    Established  
ffrl_dus1 BGP      ffnet    up     02:16:53    Established  
ibgp_gw02 BGP      ffnet    start  02:16:10    Connect   

I'll take the last one for tests:

Within the VM:
The packages are send through the tunnel:
root@gw03:~# tcpdump -i bck-gw02 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bck-gw02, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
02:24:15.725060 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 12, length 64
02:24:16.749064 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 13, length 64
02:24:17.749033 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 14, length 64

bck-gw02 ist the GRE-interface.

Now eth0 of the VM:

root@gw03:~# tcpdump -i eth0 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A8
0106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
04:03:43.757089 IP gw03 > static.88-198-51-94.clients.your-server.de:
GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 5859, length 64
04:03:44.781093 IP gw03 > static.88-198-51-94.clients.your-server.de:
GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 5860, length 64
04:03:45.805110 IP gw03 > static.88-198-51-94.clients.your-server.de:
GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id
11563, seq 5861, length 64

(Don't look on the seqence number, I had to do some other stuff and let
the ping run.)

Now the hypervisor:

vnet0 (the interface of the vm)
root@unimatrixzero ~ # tcpdump -i vnet0 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
04:05:44.496867 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5977, length 64
04:05:45.520863 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5978, length 64
04:05:46.544832 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5979, length 64
^C

root@unimatrixzero ~ # tcpdump -i virbr1 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 bytes
04:06:14.096209 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6006, length 64
04:06:15.120225 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6007, length 64
04:06:16.144186 IP 192.168.10.62 >
static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP
192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6008, length 64

And nothing on eth0 (physical interface):
root@unimatrixzero ~ # tcpdump -i eth0 proto 47 and ip[33]=0x01 and \(
ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
27 packets received by filter
9 packets dropped by kernel

The NAT kernel modul eats the packages :) and makes them vanish.


> What does tcpdump on the vNIC in the VM show?  I would expect to see
> encapsulated ICMP inside of GRE tunnel w/ the VM's private IP as the
> source and the far end's IP as the GRE destination.
See above. Looks correct.
>
> What does the host see on vnet4 or the virbr1 interfaces?  I would
> expect them to see the same thing as what the guest VM saw on it's
> vNIC (eth0?).
Yes, I did a tcpdump on vnet4 and virbr1 last time. Looked correct.
Encapsulated ICMP packages.

Look above. Looks correct.
> Oh.  I thought that something was partially working.
>
It was at the time, when I wrote my first email on here. Then things
have become even worse because of all my testings. Now things work
partly again.
> Can you disable both of the tunnels for 5 ~ 10 minutes, long enough
> for potentially stale state to clear, and then enable one tunnel?
Okay, I'll disable all tunnels except the one I tested above. But I can
only disable the „inside part“. There will still be send encapsulated
BGP packets from the remote hosts. I can't disable them.
> I would expect to see the GRE packets leaving the node, even if they
> aren't NATed.
Yes, me too. The packages just vanish. I tried to catch them with a
iptables log rule inserted after the GRE NAT rule, so that it would
catch uncatched packages. But the GRE NAT rule catches them, there were
not log entries of uncatched packagse. They just vanish during the NAT
process.
> root@unimatrixzero ~ # cat /etc/iptables/rules.v4
>> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
>> *raw
>> :PREROUTING ACCEPT [4134062347:2804377965525]
>> :OUTPUT ACCEPT [45794:9989552]
>> -A PREROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
>> -A PREROUTING -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
>> -A OUTPUT -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
>> -A OUTPUT -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
>> COMMIT
>> # Completed on Fri Oct 27 23:36:29 2017
>> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
>> *mangle
>> :PREROUTING ACCEPT [4134063569:2804378696201]
>> :INPUT ACCEPT [48005:5510967]
>> :FORWARD ACCEPT [4133838276:2804349602217]
>> :OUTPUT ACCEPT [45797:9990176]
>> :POSTROUTING ACCEPT [4133884073:2804359592393]
>> -A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM
>> --checksum-fill
>> COMMIT
>> # Completed on Fri Oct 27 23:36:29 2017
>> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
>> *nat
>> :PREROUTING ACCEPT [86097:5109916]
>> :INPUT ACCEPT [7557:460113]
>> :OUTPUT ACCEPT [162:11119]
>> :POSTROUTING ACCEPT [78890:4669843]
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 222 -j DNAT
>> --to-destination 192.168.10.62:22
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 80 -j DNAT
>> --to-destination 192.168.10.248:80
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 443 -j DNAT
>> --to-destination 192.168.10.248:443
>> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 223 -j DNAT
>> --to-destination 192.168.10.248:22
>> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62
>
> I think this may route all incoming GRE to a single host / VM,
> 192.168.10.62.
Exactly. That is what I intend to do. In this case 176.9.38.150 is the
public IP. 192.168.10.62 the private IP.
>
>> -A PREROUTING -d 176.9.38.150/32 -p udp -m udp --dport 20000:20100 -j
>> DNAT --to-destination 192.168.10.62:20000-20100
>> -A POSTROUTING -s 192.168.10.0/24 -d 224.0.0.0/24 -j RETURN
>> -A POSTROUTING -s 192.168.10.0/24 -d 255.255.255.255/32 -j RETURN
>> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p tcp -j
>> MASQUERADE --to-ports 1024-65535
>> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p udp -j
>> MASQUERADE --to-ports 1024-65535
>> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -j MASQUERADE
>
> This will very likely MASQUERADE all of the GRE traffic from
> 192.168.10.62, which means it will be NATed to the source IP of the
> interface with the best route to 185.66.195.1  Is that 176.9.38.150,
> the IP that you were looking for on the eth0 interface?  (You wouldn't
> see the 192.168.10.62 IP there as it's after NATing.
Yes, that is what I intend to do.
> I also don't see where your firewall is actually blocking any traffic,
> and a couple of other things that I'm not quite sure why you did what
> you did.  But, this discussion is for GRE issues.
The firewall configuration should be correct. We create it an track it
with Ansible. This is definitly the iptables configuration that has
worked before.

What are you unsure about? This is a public project, we have nothing to
hide ;) Just ask.
>
>> Our standard configuration is to have a seperate global IPv4 for each
>> virtual machine. We experimented with natting those GRE tunnels so
>> save one ip address per hypervisor, which worked perfectly so far.
>
> I think I'm still missing something.
>
> I would assign the globally routed IPs to the VMs directly, and route
> them to the eth0 IP of the machine.
Yes, that works. Just trying to spare an ip address here. And requesting
an ip change at our upstream provider takes time. So I once natted the
tunnels though to a VM when we virtualized the system for the first
time. Originally the gateway ran directly on the host. No within a VM.

Requesting public IPs for the VM would solve the problem. But Linux
should be capable to nat the tunnels through. And it used to work. At
high performance and reliably for several months.
>
> Or are you talking about saving the globally routable IP address on
> virbr1?
Yes.
>
> Another trick would be to use private IP addresses on virbr1 and the
> vNICs of the VMs.  You use this for routing and assign the VM's
> globally routed IP address to a dummy interface in the VMs.  -  That
> would be clear channel routing all the way in.  Save for the private
> IPs in the path, which works, but is meh in a traceroute output.
The global IP address of the host is used to access it in the first
place. It's just a rented server somewhere on the internet.
>
>> Freifunk is not just a wifi network. It's about getting to know
>> network stuff like mesh networks or software defined networks based
>> on GRE tunnels. My reasons to participate are mostly to understand
>> the technology behind all that.
>
> That sounds interesting and like a worth while cause.
Yeah, it's fun :)
>
>> As I wrote in my other email, I looked into the source code. As far
>> as I understand it, the GREv0 Nat has never been properly
>> implemented. I don't understand how this ever worked.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD
>>
>
> My read of the header is that GRE may not need a NAT helper per say. 
> It sounds like it's just a matter of altering the source / destination
> IP of the GRE encapsulation traffic.
Okay. From my understanding aswell, GRE is quite a simple technology. It
just encapsulates the IP packages and decapsulate them at their
destination. GRE is even more simple than UDP as it doesn't support ports.
>
> I also don't see anything in RFC 2784 § 2.1. GRE Header that would
> need NAT as I understand it.
But the destination IP address needs to be replaced. Not more or less
than that.

All those new tests didn't bring any new intel to me.

So I tested another kernel on the hypervisor:
root@unimatrixzero ~ # uname -a
Linux unimatrixzero 4.4.9-040409-generic #201605041832 SMP Wed May 4
22:34:16 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I think, this was one of the first kernels for Ubuntu 16.04. And this
didn't work either. From my understanding, the kernel version is not the
problem. There must be another problem. 

Are there any debugging modes for the kernel moduls? I'd like to
understand why the packages are dropped. The kernel has nothing to do
but to replace the source IP and things should be done.

Regards,
Matthias

--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux