Re: Linux NATting does not support NAT hole punching?

Adel Belhouane <bugs.a.b@xxxxxxx> · Tue, 24 Jul 2018 19:15:08 +0200

Le 24/07/2018 à 02:48, Dima Kogan a écrit :
> On 19/07/2018 20:27, Adel Belhouanez wrote:
> 
>> What is very important: the router/NAT system should *drop* unknown
>> outside incoming packets (thus not generate TCP RST or ICMP
>> unreachable errors). If it doesn't drop packets before conntrack
>> allow reverse-SNATing them because of the internal outgoing flow,
>> then the internal system will give up early and attemps will fail.
> 
> I've been experimenting with traversing Linux NAT as well, so wanted
> to chime in on this.
> 
> I can confirm that setting the NAT system to DROP unsolicited
> incoming packets is crucial for the NAT traversal to work. I
> suspect that the reason is a bit different from what is suggested
> above. (For simplicity I'll focus on the UDP case, which I think
> is what WebRTC uses.)
> 
> IIUC when an unsolicited incoming datagram arrives at port P of the
> NAT box, in the absence of a DROP rule the NAT assumes that the packet
> is targeted to a local process on the NAT system itself (rather than
> to some host on the internal network). It thus allocates the local
> port P to a session between the sender and the local host (the NAT
> system itself). From what I've seen in some netfilter documentation,
> this is sometimes referred to as a "null binding".  Subsequently, when
> a host on the internal network sends a datagram from the same internal
> port number P, the NAT maps it to a different external port number P',
> since port P is already allocated by this "null binding" to the local
> host. When this datagram reaches the other party, it will most often
> fail to traverse the NAT on the remote side, since it is now coming
> from an unexpected port number.
> 

I was focused on TCP, and on TCP the conntrack entry is immediately
DESTROYed upon replying with TCP RST, while indeed for the UDP entry,
despite the ICMP port unreachable sent, it doesn't destroy the conntrack
entry before its default timeout (30s). So After doing:

ip netns exec r1 iptables -P INPUT ACCEPT
ip netns exec r2 iptables -P INPUT ACCEPT

but dropping icmp on the systems behind (so they can keep trying):

ip netns exec s1 iptables -A INPUT -p icmp -j DROP
ip netns exec s2 iptables -A INPUT -p icmp -j DROP

ip netns exec r1 conntrack -E
    [NEW] udp      17 30 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
    [NEW] udp      17 30 src=203.0.113.12 dst=198.51.100.11 sport=1024 dport=1111 [UNREPLIED] src=198.51.100.11 dst=203.0.113.12 sport=1111 dport=1024

Notice how the sport=2222 was altered into sport=1024, because there's
a conflict, because it's considered a new flow again, because of the
ICMP host port unreachable (unseen here).

30s later:

[DESTROY] udp      17 src=203.0.113.12 dst=198.51.100.11 sport=1024 dport=1111 [UNREPLIED] src=198.51.100.11 dst=203.0.113.12 sport=1111 dport=1024
[DESTROY] udp      17 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111

While dropping TCP RST on systems behind for TCP:

ip netns exec s1 iptables -A INPUT -p tcp -m tcp --tcp-flags  RST RST -j DROP
ip netns exec s2 iptables -A INPUT -p tcp -m tcp --tcp-flags  RST RST -j DROP

ip netns exec r1 conntrack -E
    [NEW] tcp      6 120 SYN_SENT src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
[DESTROY] tcp      6 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
    [NEW] tcp      6 120 SYN_SENT src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111 [UNREPLIED] src=198.51.100.11 dst=203.0.113.12 sport=1111 dport=2222
[DESTROY] tcp      6 src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111 [UNREPLIED] src=198.51.100.11 dst=203.0.113.12 sport=1111 dport=2222
    [NEW] tcp      6 120 SYN_SENT src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
[DESTROY] tcp      6 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
    [NEW] tcp      6 120 SYN_SENT src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111 [UNREPLIED] src=198.51.100.11 dst=203.0.113.12 sport=1111 dport=2222
[DESTROY] tcp      6 src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111 [UNREPLIED] src=198.51.100.11 dst=203.0.113.12 sport=1111 dport=2222
[...]

the ports are not altered (but it still won't work because of the timing).

That means, in the adverse case of ACCEPT, TCP hole punching is allowed
more than one attempt to synchronize, while UDP can succeed only once per
pair of ports if:
- the one simultaneous attempt is synchronized (by NTP and a very precise
  simultaneous emission agreement), 
- there's enough delay on internet to allow imprecision on the
  synchronized attempt.

That again can be reproduced with tc and netem from my previous example
(to add a delay of 1s on "internet" in each direction, because it's a test
with fingers):

ip netns exec in tc qdisc add dev left0 root netem delay 1000ms
ip netns exec in tc qdisc add dev right0 root netem delay 1000ms

The UDP socat command can now work if using the first text is sent within 1s of each other:
    [NEW] udp      17 30 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 [UNREPLIED] src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
 [UPDATE] udp      17 30 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111
 [UPDATE] udp      17 180 src=192.0.2.2 dst=203.0.113.12 sport=1111 dport=2222 src=203.0.113.12 dst=198.51.100.11 sport=2222 dport=1111 [ASSURED]

regards,
Adel Belhouane.
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html