Re: GRE-NAT broken

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/25/2018 12:47 AM, walther.xyz wrote:
Hello Grant,

Hi,

thanks for your reply. I'll respond to your questions inline.

You're welcome.

We're running gateways for an open wifi project here in Germany, it's called Freifunk (freifunk.net), it's non commercial. We connect those gateways with our AS exit routers via GRE tunnels, GRE over IPv4.

Okay.

To save money and ressources, we virtualize the hardware with KVM. Usually we have an extra IPv4 address for each virtual machine. In two experimental cases I tried to spare the ipv4 address and nat the gre tunnels from the hypervisor's public ip address and only give the virtual machine a private ip address (192.168....). Standard destination nat with the iptables rule as mentioned.

Okay.

So are the VMs functioning as routers for clients behind them?

It sounds like the GRE tunnel is functionally used to connect the VM with your border routers, correct?

The bridges are created with brctl and the topology in this particular case looks as following:

root@unimatrixzero ~ # brctl show
bridge name    bridge id        STP enabled    interfaces
br0        8000.fe540028664d    no             vnet2
                                               vnet3
                                               vnet5
                                               vnet6
virbr1     8000.5254007bec03    yes            virbr1-nic
                                               vnet4

The hoster is Hetzner, a German budget hosting company. The do not block GRE tunnels. GRE to public ip addresses just work fine. As this hypervisor contains both virtual machines with public ip addresses and private (192.168...) ip addresses, we have two bridges. Depending on the configuration, the virtual machines are in br0 (public ip addresses) and the ones with private addresses in virbr1.

Thank you for the details.

It now occurs to me to ask, are these VMs hosted within your network or outside in the cloud?

I'm now getting the impression that the GRE tunnel might be from your border router, across the Internet, and into VMs in the cloud.

Unfortunately not. We're running unattended upgrades on the machines. It's a free time project and we don't have the man power to updated all our hosts manually. I'm not even sure, weather the kernel was updated or not. I tried the oldest kernel still available on the machine and a much older kernel 4.4. Ubuntu automatically deinstalls unneeded, older kernels. Maybe a security patch, that got applied to 4.4, aswell as 4.10 and 4.13 and 4.14 destroyed the case. Maybe I should try an older 4.4 kernel, not revision 113.

But I can say for sure, that we had two experimental machines running this configuration with natted gre tunnels and both stopped working around the same time after this had worked stabily for several months.

Okay. That just means that it's not currently possible to revert to something that works as a diagnostic aid. So the only way out is forward through the problem.

I was pinging from the inside of the VM into the GRE tunnel. So the packet flow is as follows:

ICMP packet goes into virtual GRE interface within the virtual machine. Then it is encasuplated with the private ip address as source and send out through eth0 of the virtual machine.

The packet is now in the network stack of the hypervisor. Comming in through vnet4, going through virbr1-bridge. Then it should be natted, so the private source address of the gre packet should be replaced by the public ip address of the hypervisor. Then the natted packet is sent out to the other end of the gre tunnel somewhere on the internet. The last step, the nat and the sending though the physical interface is what doesn't happen.

Okay.

What does tcpdump on the vNIC in the VM show? I would expect to see encapsulated ICMP inside of GRE tunnel w/ the VM's private IP as the source and the far end's IP as the GRE destination.

What does the host see on vnet4 or the virbr1 interfaces? I would expect them to see the same thing as what the guest VM saw on it's vNIC (eth0?).

Funnily, after each reboot a different tunnel seemed to work. All tunnels do the same, they're just going to different backbone upstream severs for redundancy.

Okay.

That's why we're not sure when the problem first occured. Due to the fact that everything seemed to work fine, because one tunnel is enough, the problem hadn't been discovered directly. Now it stopped working completly.

Oh.  I thought that something was partially working.

Can you disable both of the tunnels for 5 ~ 10 minutes, long enough for potentially stale state to clear, and then enable one tunnel?

Unfortunatley, I can't provide a working example as since I tested all those different kernel version, nothing works anymore. Not a single tunnel, even though I went back to 4.13.0-31 with which I had captured the packets yesterday.

:-/

(As I rebooted again, vnet4 is now vnet0.)

ACK

See here the three steps  seperatly:

Thank you.

root@unimatrixzero ~ # tcpdump -ni vnet0 host 185.66.195.1 and \( host 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:29:15.127873 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 59, length 64 08:29:16.151856 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 60, length 64 08:29:17.175800 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 61, length 64 08:29:18.199780 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 62, length 64
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel

That seems reasonable enough.

root@unimatrixzero ~ # tcpdump -ni virbr1 host 185.66.195.1 and \( host 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 bytes 08:29:33.495592 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 77, length 64 08:29:34.519567 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 78, length 64 08:29:35.543572 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 18763, seq 79, length 64
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel

Likewise with this.

root@unimatrixzero ~ # tcpdump -ni eth0 host 185.66.195.1 and \( host 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
10 packets received by filter
0 packets dropped by kernel

So, for some reason, your GRE packets don't seem to be leaving the system. - I'll have to look at your firewall config (I think you provided it below).

The GRE packets go through the interface and through the bridge, but the GRE packet isn't natted and never send out through the physical interface (=eth0) on the hypervisor.

I would expect to see the GRE packets leaving the node, even if they aren't NATed.

All those tcpdumps are made on the hypervisor.

ACK

In the first example, where the nat worked, we've see those three steps aswell. And the packets went out through eth0, got an icmp reply which took the reverse path to it's destination, the virtual machine, where the gre got decapsulated and ping got its result package.

This is what I would expect to happen, and I suspect what you desire to happen.

I made sure, that the nf_nat_proto_gre and nf_conntrack_proto_gre modules are loaded. Lsmod shows them.

ACK

Virsh creates the bridge based on this xml file:

virsh # net-dumpxml ipv4-nat
<network>
  <name>ipv4-nat</name>
  <uuid>2c0daba2-1e17-4d0d-9b9e-2acf09435da6</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr1' stp='on' delay='0'/>
  <mac address='52:54:00:7b:ec:03'/>
  <ip address='192.168.10.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.10.2' end='192.168.10.254'/>
    </dhcp>
  </ip>
</network>

That looks reasonable enough.

I also feel like this may be more an IPTables problem than a bridge problem.

This entry doesn't exist here.

root@unimatrixzero ~ # cat proc/sys/net
core/             ipv6/             nf_conntrack_max
ipv4/             netfilter/        unix/

Good.  I think you want it to not be there.

There is no bridge, or virbr1 entry in ipv4 either. Nor did I find something familiar in netfilter.

Okay.

root@unimatrixzero ~ # cat /etc/iptables/rules.v4
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*raw
:PREROUTING ACCEPT [4134062347:2804377965525]
:OUTPUT ACCEPT [45794:9989552]
-A PREROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
-A PREROUTING -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
-A OUTPUT -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE
-A OUTPUT -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE
COMMIT
# Completed on Fri Oct 27 23:36:29 2017
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*mangle
:PREROUTING ACCEPT [4134063569:2804378696201]
:INPUT ACCEPT [48005:5510967]
:FORWARD ACCEPT [4133838276:2804349602217]
:OUTPUT ACCEPT [45797:9990176]
:POSTROUTING ACCEPT [4133884073:2804359592393]
-A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
# Completed on Fri Oct 27 23:36:29 2017
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*nat
:PREROUTING ACCEPT [86097:5109916]
:INPUT ACCEPT [7557:460113]
:OUTPUT ACCEPT [162:11119]
:POSTROUTING ACCEPT [78890:4669843]
-A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 222 -j DNAT --to-destination 192.168.10.62:22 -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.10.248:80 -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 192.168.10.248:443 -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 223 -j DNAT --to-destination 192.168.10.248:22
-A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62

I think this may route all incoming GRE to a single host / VM, 192.168.10.62.

-A PREROUTING -d 176.9.38.150/32 -p udp -m udp --dport 20000:20100 -j DNAT --to-destination 192.168.10.62:20000-20100
-A POSTROUTING -s 192.168.10.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.10.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535 -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -j MASQUERADE

This will very likely MASQUERADE all of the GRE traffic from 192.168.10.62, which means it will be NATed to the source IP of the interface with the best route to 185.66.195.1 Is that 176.9.38.150, the IP that you were looking for on the eth0 interface? (You wouldn't see the 192.168.10.62 IP there as it's after NATing.

COMMIT
# Completed on Fri Oct 27 23:36:29 2017
# Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017
*filter
:INPUT ACCEPT [47667:5451204]
:FORWARD ACCEPT [4133512236:2804145422827]
:OUTPUT ACCEPT [45662:9946618]
-A INPUT -i virbr1 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i virbr1 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i virbr1 -p tcp -m tcp --dport 67 -j ACCEPT
-A FORWARD -d 192.168.10.0/24 -o virbr1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.10.0/24 -i virbr1 -j ACCEPT
-A FORWARD -i virbr1 -o virbr1 -j ACCEPT
-A FORWARD -d 192.168.10.0/24 -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT -A FORWARD -d 192.168.10.0/24 -i eth0 -o virbr1 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.10.0/24 -i virbr1 -o eth0 -j ACCEPT
-A FORWARD -i virbr1 -o virbr1 -j ACCEPT
-A OUTPUT -o virbr1 -p udp -m udp --dport 68 -j ACCEPT
COMMIT
# Completed on Fri Oct 27 23:36:29 2017

I don't see anything else that might interfere with the GRE traffic that you're looking for.

I also don't see where your firewall is actually blocking any traffic, and a couple of other things that I'm not quite sure why you did what you did. But, this discussion is for GRE issues.

KVM. KVM creates the interface on the hypervisor and puts it into the bridge.

ACK

The bridge is a natted /24 subnet created by kvm. All VM that don't have a public address are connected to the bridge which nats the outgoing connections just like a standard home router would do.

*nod*

A bridge isn't necessary here. It just makes things easier. You could route each virtual machine seperately. It's just kvm's approch do make things smothier.

*nod*

Our standard configuration is to have a seperate global IPv4 for each virtual machine. We experimented with natting those GRE tunnels so save one ip address per hypervisor, which worked perfectly so far.

I think I'm still missing something.

I would assign the globally routed IPs to the VMs directly, and route them to the eth0 IP of the machine.

Or are you talking about saving the globally routable IP address on virbr1?

Another trick would be to use private IP addresses on virbr1 and the vNICs of the VMs. You use this for routing and assign the VM's globally routed IP address to a dummy interface in the VMs. - That would be clear channel routing all the way in. Save for the private IPs in the path, which works, but is meh in a traceroute output.

Freifunk is not just a wifi network. It's about getting to know network stuff like mesh networks or software defined networks based on GRE tunnels. My reasons to participate are mostly to understand the technology behind all that.

That sounds interesting and like a worth while cause.

As I wrote in my other email, I looked into the source code. As far as I understand it, the GREv0 Nat has never been properly implemented. I don't understand how this ever worked.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD

My read of the header is that GRE may not need a NAT helper per say. It sounds like it's just a matter of altering the source / destination IP of the GRE encapsulation traffic.

I also don't see anything in RFC 2784 § 2.1. GRE Header that would need NAT as I understand it.

But GRE-natting is possible. Even my internet provider's 50 Euro router can do it.

I've not done much with GRE, but I think it's a very simple encapsulation. Which means that as long as you get the local and remote IPs correct on both ends, including reflecting any NATing, I think things will likely work.

Thanks for your help!

You're welcome.



--
Grant. . . .
unix || die

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux