Hallo Grant, I'll short this email to keep clarity. Am 25.01.2018 um 23:57 schrieb Grant Taylor: > So are the VMs functioning as routers for clients behind them? Yes. Our wifi access points, which are typically OpenWrt based routers like the TP-LINK TL-WR841N or the TP-LINK TL-WR1043ND connect through L2TP to the Gateways, which then route between our network and the upstream via GRE tunnels. Our gateways are basically a VPN provider system, this is needed because of German law, elsewise the people who share their internet connection could be held responsable for what their guests do in their wifi, if they do evil things like illegal filesharing or worse things. With our network, we fight this stupid law situation and made it quite far already. The worst part of the law called "Störerhaftung", which had held you responsable for anything that was sent over your internet connection, has been abolished during the last legislation period. But people are still scared after they've been told for 15 years that they should never ever share their internet connection with strangers, because they could get cease and desist letters, which can easily cost you 500 or 1000 Euros. So for a while, we'll stick with our VPN network. :) But things are getting better now and some of the federal states started supporting us financially. > > It sounds like the GRE tunnel is functionally used to connect the VM > with your border routers, correct? Correct. We use GRE tunnels between the gateways for cross traffic and for upstream. Both tunnels are affected. > It now occurs to me to ask, are these VMs hosted within your network > or outside in the cloud? They are rented servers at Hetzner or other cheap hosting companies. We don't have physical access to them. We just rent them and configure them to become VPN endpoints. > > I'm now getting the impression that the GRE tunnel might be from your > border router, across the Internet, and into VMs in the cloud. It's not really important, where the servers are located. Most of them are at Hetzner, but not all of them. Typical situation: VM(pubilc IP) <---> Hypervisor <---the internet---> Hypervisor <---> VM (public OR private IP) The tunnels between public IPs work perfectly. It's just the NAT VMs that cause trouble. We cannot affort to host everything on bare metal servers. We need virtualisation to cut costs. > Okay. That just means that it's not currently possible to revert to > something that works as a diagnostic aid. So the only way out is > forward through the problem. Sometimes two of seven tunnels work. Sometimes none. Today two work, here a status report from bird, our BGP daemon: ffrl_fra0 BGP ffnet up 02:16:10 Established ffrl_fra1 BGP ffnet start 02:16:10 Connect ffrl_ber0 BGP ffnet start 02:16:10 Connect ffrl_ber1 BGP ffnet start 02:16:10 Connect ffrl_dus0 BGP ffnet up 02:16:22 Established ffrl_dus1 BGP ffnet start 02:16:10 Connect ibgp_gw02 BGP ffnet start 02:16:10 Connect (Only established tunnels work correctly, "Connect" indicates, that there's something wrong.) The first six are the upstream connections. The last one is to the partner gateway which exists as failover. So I made you a record, how it is supposed to look: root@unimatrixzero ~ # tcpdump -ni any host 185.66.195.1 and \( host 176.9.38.150 or host 192.168.10.62 \) and proto 47 and ip[33]=0x01 and \( ip[36:4]==0x644007BA or ip[40:4]==0x644007BA \) tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 02:20:16.542279 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64 02:20:16.542282 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64 02:20:16.542286 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 23, length 64 02:20:16.561304 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64 02:20:16.561313 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64 02:20:16.561315 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 23, length 64 02:20:17.543573 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64 02:20:17.543587 IP 192.168.10.62 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64 02:20:17.543605 IP 176.9.38.150 > 185.66.195.1: GREv0, length 88: IP 185.66.194.49 > 100.64.7.186: ICMP echo request, id 10835, seq 24, length 64 02:20:17.562563 IP 185.66.195.1 > 176.9.38.150: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64 02:20:17.562585 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64 02:20:17.562590 IP 185.66.195.1 > 192.168.10.62: GREv0, length 88: IP 100.64.7.186 > 185.66.194.49: ICMP echo reply, id 10835, seq 24, length 64 Well, while running those tests, more tunnels started working: ffrl_fra0 BGP ffnet up 02:16:10 Established ffrl_fra1 BGP ffnet start 02:16:10 Connect ffrl_ber0 BGP ffnet start 02:16:10 Connect ffrl_ber1 BGP ffnet up 02:16:33 Established ffrl_dus0 BGP ffnet up 02:16:22 Established ffrl_dus1 BGP ffnet up 02:16:53 Established ibgp_gw02 BGP ffnet start 02:16:10 Connect I'll take the last one for tests: Within the VM: The packages are send through the tunnel: root@gw03:~# tcpdump -i bck-gw02 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bck-gw02, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 02:24:15.725060 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 12, length 64 02:24:16.749064 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 13, length 64 02:24:17.749033 IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 14, length 64 bck-gw02 ist the GRE-interface. Now eth0 of the VM: root@gw03:~# tcpdump -i eth0 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A8 0106 \) tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 04:03:43.757089 IP gw03 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5859, length 64 04:03:44.781093 IP gw03 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5860, length 64 04:03:45.805110 IP gw03 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5861, length 64 (Don't look on the seqence number, I had to do some other stuff and let the ping run.) Now the hypervisor: vnet0 (the interface of the vm) root@unimatrixzero ~ # tcpdump -i vnet0 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \) tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vnet0, link-type EN10MB (Ethernet), capture size 262144 bytes 04:05:44.496867 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5977, length 64 04:05:45.520863 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5978, length 64 04:05:46.544832 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 5979, length 64 ^C root@unimatrixzero ~ # tcpdump -i virbr1 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \) tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on virbr1, link-type EN10MB (Ethernet), capture size 262144 bytes 04:06:14.096209 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6006, length 64 04:06:15.120225 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6007, length 64 04:06:16.144186 IP 192.168.10.62 > static.88-198-51-94.clients.your-server.de: GREv0, length 88: IP 192.168.1.7 > 192.168.1.6: ICMP echo request, id 11563, seq 6008, length 64 And nothing on eth0 (physical interface): root@unimatrixzero ~ # tcpdump -i eth0 proto 47 and ip[33]=0x01 and \( ip[36:4]==0xC0A80106 or ip[40:4]==0xC0A80106 \) tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes ^C 0 packets captured 27 packets received by filter 9 packets dropped by kernel The NAT kernel modul eats the packages :) and makes them vanish. > What does tcpdump on the vNIC in the VM show? I would expect to see > encapsulated ICMP inside of GRE tunnel w/ the VM's private IP as the > source and the far end's IP as the GRE destination. See above. Looks correct. > > What does the host see on vnet4 or the virbr1 interfaces? I would > expect them to see the same thing as what the guest VM saw on it's > vNIC (eth0?). Yes, I did a tcpdump on vnet4 and virbr1 last time. Looked correct. Encapsulated ICMP packages. Look above. Looks correct. > Oh. I thought that something was partially working. > It was at the time, when I wrote my first email on here. Then things have become even worse because of all my testings. Now things work partly again. > Can you disable both of the tunnels for 5 ~ 10 minutes, long enough > for potentially stale state to clear, and then enable one tunnel? Okay, I'll disable all tunnels except the one I tested above. But I can only disable the „inside part“. There will still be send encapsulated BGP packets from the remote hosts. I can't disable them. > I would expect to see the GRE packets leaving the node, even if they > aren't NATed. Yes, me too. The packages just vanish. I tried to catch them with a iptables log rule inserted after the GRE NAT rule, so that it would catch uncatched packages. But the GRE NAT rule catches them, there were not log entries of uncatched packagse. They just vanish during the NAT process. > root@unimatrixzero ~ # cat /etc/iptables/rules.v4 >> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017 >> *raw >> :PREROUTING ACCEPT [4134062347:2804377965525] >> :OUTPUT ACCEPT [45794:9989552] >> -A PREROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE >> -A PREROUTING -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE >> -A OUTPUT -d 192.168.0.0/24 -p tcp -m tcp --dport 80 -j TRACE >> -A OUTPUT -d 192.168.10.0/24 -p tcp -m tcp --dport 222 -j TRACE >> COMMIT >> # Completed on Fri Oct 27 23:36:29 2017 >> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017 >> *mangle >> :PREROUTING ACCEPT [4134063569:2804378696201] >> :INPUT ACCEPT [48005:5510967] >> :FORWARD ACCEPT [4133838276:2804349602217] >> :OUTPUT ACCEPT [45797:9990176] >> :POSTROUTING ACCEPT [4133884073:2804359592393] >> -A POSTROUTING -o virbr1 -p udp -m udp --dport 68 -j CHECKSUM >> --checksum-fill >> COMMIT >> # Completed on Fri Oct 27 23:36:29 2017 >> # Generated by iptables-save v1.6.0 on Fri Oct 27 23:36:29 2017 >> *nat >> :PREROUTING ACCEPT [86097:5109916] >> :INPUT ACCEPT [7557:460113] >> :OUTPUT ACCEPT [162:11119] >> :POSTROUTING ACCEPT [78890:4669843] >> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 222 -j DNAT >> --to-destination 192.168.10.62:22 >> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 80 -j DNAT >> --to-destination 192.168.10.248:80 >> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 443 -j DNAT >> --to-destination 192.168.10.248:443 >> -A PREROUTING -d 176.9.38.150/32 -p tcp -m tcp --dport 223 -j DNAT >> --to-destination 192.168.10.248:22 >> -A PREROUTING -i eth0 -p gre -j DNAT --to-destination 192.168.10.62 > > I think this may route all incoming GRE to a single host / VM, > 192.168.10.62. Exactly. That is what I intend to do. In this case 176.9.38.150 is the public IP. 192.168.10.62 the private IP. > >> -A PREROUTING -d 176.9.38.150/32 -p udp -m udp --dport 20000:20100 -j >> DNAT --to-destination 192.168.10.62:20000-20100 >> -A POSTROUTING -s 192.168.10.0/24 -d 224.0.0.0/24 -j RETURN >> -A POSTROUTING -s 192.168.10.0/24 -d 255.255.255.255/32 -j RETURN >> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p tcp -j >> MASQUERADE --to-ports 1024-65535 >> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -p udp -j >> MASQUERADE --to-ports 1024-65535 >> -A POSTROUTING -s 192.168.10.0/24 ! -d 192.168.10.0/24 -j MASQUERADE > > This will very likely MASQUERADE all of the GRE traffic from > 192.168.10.62, which means it will be NATed to the source IP of the > interface with the best route to 185.66.195.1 Is that 176.9.38.150, > the IP that you were looking for on the eth0 interface? (You wouldn't > see the 192.168.10.62 IP there as it's after NATing. Yes, that is what I intend to do. > I also don't see where your firewall is actually blocking any traffic, > and a couple of other things that I'm not quite sure why you did what > you did. But, this discussion is for GRE issues. The firewall configuration should be correct. We create it an track it with Ansible. This is definitly the iptables configuration that has worked before. What are you unsure about? This is a public project, we have nothing to hide ;) Just ask. > >> Our standard configuration is to have a seperate global IPv4 for each >> virtual machine. We experimented with natting those GRE tunnels so >> save one ip address per hypervisor, which worked perfectly so far. > > I think I'm still missing something. > > I would assign the globally routed IPs to the VMs directly, and route > them to the eth0 IP of the machine. Yes, that works. Just trying to spare an ip address here. And requesting an ip change at our upstream provider takes time. So I once natted the tunnels though to a VM when we virtualized the system for the first time. Originally the gateway ran directly on the host. No within a VM. Requesting public IPs for the VM would solve the problem. But Linux should be capable to nat the tunnels through. And it used to work. At high performance and reliably for several months. > > Or are you talking about saving the globally routable IP address on > virbr1? Yes. > > Another trick would be to use private IP addresses on virbr1 and the > vNICs of the VMs. You use this for routing and assign the VM's > globally routed IP address to a dummy interface in the VMs. - That > would be clear channel routing all the way in. Save for the private > IPs in the path, which works, but is meh in a traceroute output. The global IP address of the host is used to access it in the first place. It's just a rented server somewhere on the internet. > >> Freifunk is not just a wifi network. It's about getting to know >> network stuff like mesh networks or software defined networks based >> on GRE tunnels. My reasons to participate are mostly to understand >> the technology behind all that. > > That sounds interesting and like a worth while cause. Yeah, it's fun :) > >> As I wrote in my other email, I looked into the source code. As far >> as I understand it, the GREv0 Nat has never been properly >> implemented. I don't understand how this ever worked. >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/net/ipv4/netfilter/nf_nat_proto_gre.c?id=HEAD >> > > My read of the header is that GRE may not need a NAT helper per say. > It sounds like it's just a matter of altering the source / destination > IP of the GRE encapsulation traffic. Okay. From my understanding aswell, GRE is quite a simple technology. It just encapsulates the IP packages and decapsulate them at their destination. GRE is even more simple than UDP as it doesn't support ports. > > I also don't see anything in RFC 2784 § 2.1. GRE Header that would > need NAT as I understand it. But the destination IP address needs to be replaced. Not more or less than that. All those new tests didn't bring any new intel to me. So I tested another kernel on the hypervisor: root@unimatrixzero ~ # uname -a Linux unimatrixzero 4.4.9-040409-generic #201605041832 SMP Wed May 4 22:34:16 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux I think, this was one of the first kernels for Ubuntu 16.04. And this didn't work either. From my understanding, the kernel version is not the problem. There must be another problem. Are there any debugging modes for the kernel moduls? I'd like to understand why the packages are dropped. The kernel has nothing to do but to replace the source IP and things should be done. Regards, Matthias -- To unsubscribe from this list: send the line "unsubscribe lartc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html