vs conntrack changes TCP ports mid-stream

Sven Bartscher <sven.bartscher@xxxxxxxxxxxxxxxxxxxx> · Wed, 21 Jun 2023 18:46:43 +0200

I'm forwarding the report from 
https://bugzilla.netfilter.org/show_bug.cgi?id=1669 here, since it was 
pointed out there, that this list would be more appropriate.

When using an ipvs service in combination with SNAT and a NOTRACK rule, 
specific circumstances can lead to TCP ports of packets being changed 
mid-stream, which results in successful connections that no data can be 
effectively sent over.

Consider the following example:

```
root@router:~# sysctl net.ipv4.vs.conntrack
net.ipv4.vs.conntrack = 1

root@router:~# iptables -t raw -L -n -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source 
destination
   24  1296 CT         tcp  --  enp1s0 *       0.0.0.0/0 
10.0.0.1             tcp dpt:1234 NOTRACK

root@router:~# iptables -t nat -L -n -v
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source 
destination

Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source 
destination

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source 
destination

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source 
destination
    4   240 SNAT       all  --  *      *       10.0.0.0/24 
10.0.1.0/24          to:10.0.1.1

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source 
destination

root@router:~# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.1:1234 rr
  -> 10.0.1.2:1234                Masq    1      0          0
  -> 10.0.1.3:1234                Masq    1      0          0
```

The reals servers are running

```
socat TCP4-LISTEN:1234,fork 'EXEC:sh -c 
echo${IFS}hello;read${IFS}r${IFS}L;sleep${IFS}1'
```

We dump the network traffic between the router and client on the ipvs 
router as follows:

```
root@router:~# tcpdump -pXXni enp1s0 icmp or tcp -w 
/tmp/ipvs_port_reuse.pcap
tcpdump: listening on enp1s0, link-type EN10MB (Ethernet), capture size 
262144 bytes
^C16 packets captured
16 packets received by filter
0 packets dropped by kernel
```

While the capture is running, we run the following commands on a client 
to trigger the buggy behavior:

```
root@debian:~# netcat -p 4321 -v 10.0.01 1234
Connection to 10.0.01 1234 port [tcp/*] succeeded!
hello
^C
root@debian:~# sleep 60
root@debian:~# netcat -p 4321 -v 10.0.01 1234
Connection to 10.0.01 1234 port [tcp/*] succeeded!
^C
root@debian:~#
```

We can see that on the first connection attempt we successfully receive 
a reply with payload from the server and then terminate the connection 
with Ctrl+C. Then we wait 60 seconds, which is necessary for the 
previous connection to move out of the TIME_WAIT state. Afterwards we 
open another connection, reusing the same src port as on the first 
connection and don't receive a reply from the server. The captured 
traffic shows, that after the three-way handshake for the second TCP 
connection, packets from the router to the clients use another server 
port than the one used for the initiation of the connection.

Regards
Sven
Attachment:
ipvs_port_reuse.pcap

Description: application/vnd.tcpdump.pcap