RE: Distinguishing NAT(PAT) inbound frames when using IPsec Transport mode from multiple NAT(PAT) systems

"Rajcan, Steven L" <Steven.Rajcan@xxxxxxxxxx> · Mon, 17 Jul 2017 17:42:07 +0000

I think part of the problem is that the conntrack entries are being
overwritten when the second TCP dialogue is initiated only in this scenario.

I retried a test with two PAT systems (192.168.0.1 and 192.168.0.2) both
initiating a TCP dialogue with source port 5000 to port 9000 of a public
system (20.20.20.20). Both established an IPsec SA with a unique mark. I
added following SNAT rules to try to force each PAT connection to modify the
source port to a specified port:
	iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x01 -p tcp
--sport 5000 -j SNAT --to-source 20.20.20.10:5011
	iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x02 -p tcp
--sport 5000 -j SNAT --to-source 20.20.20.10:5012

I attempted the first TCP dialogue and saw this conntrack entry:
	tcp      6 431991 ESTABLISHED src=20.20.20.10 dst=20.20.20.20
sport=5000 dport=9000 src=20.20.20.10 dst=20.20.20.20 sport=9000 dport=5011
[ASSURED] mark=1 use=1
I attempted the second dialogue and saw this conntrack entry but the
previous one was removed:
	tcp      6 431999 ESTABLISHED src=20.20.20.10 dst=20.20.20.20
sport=5000 dport=9000 src=20.20.20.10 dst=20.20.20.20 sport=9000 dport=5012
[ASSURED] mark=2 use=1

I had inserted a LOG in the mangle OUPUT chain and saw that after the second
dialogue was initiated, a single TCP response frame was sent to the first
PAT system with the modified source port (5011) and mark (0x02) mismatched:
	kernel: [ 6484.314200] IN= OUT=eth0 SRC=20.20.20.20 DST=20.20.20.10
LEN=64 TOS=0x00 PREC=0x00 TTL=64 ID=15222 DF PROTO=TCP SPT=9000 DPT=5011
WINDOW=227 RES=0x00 ACK URGP=0 MARK=0x2
Subsequent frames back to the first PAT system showed no mark and no
modified source port
	kernel: [ 6503.198838] IN= OUT=eth0 SRC=20.20.20.20 DST=20.20.20.10
LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=59925 DF PROTO=TCP SPT=9000 DPT=5000
WINDOW=0 RES=0x00 RST URGP=0

TCP responses to the second dialogue were appropriate.
	kernel: [ 5680.324821] IN= OUT=eth0 SRC=20.20.20.20 DST=20.20.20.10
LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=32252 DF PROTO=TCP SPT=9000 DPT=5012
WINDOW=227 RES=0x00 ACK URGP=0 MARK=0x2

>Noel,
>
>Thanks for the response. We are already using the connmark plugin in
>Strongswan. Using that plugin we are able to correctly configure and
>establish IPsec SAs that map to individual PAT systems. Data transfer
>through those SAs works fine except in the one case described where
multiple
>PAT systems attempt a TCP or UDP dialogue to a common port of a public
>system with a common source port.
>
>I tried the SNAT you suggested but modified for the environment:
>       iptables -t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT
>--to-source 20.20.20.10 --random
>The source port was modified but it used the same source port for both
>inbound frames from each PAT endpoint. So the problem remained. The second
>TCP attempt succeeded but replaced the first TCP connection.
>
>The inbound, transformed frames maintain the mark set by the connmark. This
>means we can associate each inbound frame came from which PAT system. So I
>tried to SNAT just the port on marked inbound frames with the following:
>       iptables -t nat -A INPUT -s 20.20.20.10 -m mark ! --mark 0x00 -j
>SNAT --to-source 20.20.20.10 --random
>The source ports were changed but the new source port was again the same
for
>each frame from both PAT systems.
>
>I then tried to create two SNAT entries, one for each mark that was created
>by the connmark plugin:
>       iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x01 -j SNAT
>--to-source 20.20.20.10 --random
>       iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x02 -j SNAT
>--to-source 20.20.20.10 --random
>Again, the source ports were changed but the new source port was again the
>same for each frame from both PAT systems.
>
>Finally, I decided to create a TCP dialogue specific SNAT entry with an
>assigned modified port for each PAT system. That way I knew the original
>source ports from each PAT system would be mapped to unique source ports:
>       iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x1
>-m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5001
>       iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x2
>-m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5002
>When I attempted the test, the first TCP connection showed the mapped
source
>port of 5001. However, the second TCP connection again replaced the first.
I
>double checked with netstat and verified that there was only one TCP
>dialogue with the modified source port (5002) mapped from the second PAT
>system. This does not make sense but that is what happened.
>
>I think the issue is that the stack does not map the restored reply frame
>back to the correct IPsec SA for a particular PAT system but only in this
>scenario when the original post-transform inbound frames are identical from
>multiple PAT peers.
>
>Steve
>
> >I forgot to mention, that you can perform nat in INPUT and randomize the
>source ports with SNAT.
>>-t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT --random
>>
>>Check the man page for iptables-extensions for details.
>>*nat INPUT isn't in the colourful graph that Jan Engelhardt made, but it
>exists.
>>
>>On 12.07.2017 21:41, Noel Kuntze wrote:
>>> Hello Steven,
>>>
>>> Take a look at what the connmark plugin[1] of strongSwan does.
>>> I think doing the same fixes your problem. Or switch to strongSwan
>>> right away and use the plugin.
>>>
>>> [1] https://wiki.strongswan.org/projects/strongswan/wiki/Connmark
>>>
>>> Kind regards
>>>
>>> Noel
>>>
>>> On 12.07.2017 21:35, Rajcan, Steven  L wrote:
>>>> Hello,
>>>>
>>>> I have created IPsec policies using transport mode that allow systems
>behind
>>>> a NAT(PAT) router to connect to a public system. The issue I am having
>is on
>>>> a public system with established IPsec tunnels to systems behind a PAT
>>>> (Port-Address-Translation) router. These routers multiplex systems
>behind a
>>>> single IPV4 address. The IPsec SAs are created properly and I am able
to
>>>> send data from these PAT system to the public system through the IPsec
>>>> tunnels in most scenarios. However, it is possible that frames sent
from
>two
>>>> or more PAT systems arrive at the public server stack with the same
>source
>>>> port, same source IP, same destination port, and same destination IP.
>This
>>>> occurs because the PAT router cannot modify the original TCP or UDP
>payload
>>>> encapsulated in the ESP frame. In these scenarios, the stack on the
>public
>>>> system gets confused and cannot map replies to those frame back through
>the
>>>> correct IPsec tunnel of the PAT system.
>>>>
>>>> Consider two PAT systems attempting a TCP connection to the same public
>>>> server but each happens to use the same local port of 45000.
>>>>     PAT1 system IP addr:               192.168.0.1
>>>>     PAT2 system IP addr                192.168.0.2
>>>>     PAT router public IP addr        20.20.20.10
>>>>     Public system IP addr:              20.20.20.20
>>>> Note that the original TCP frame, sent by the PATx system is
>encapsulated in
>>>> a UDP/ESP frame and is therefore, not modified by the PAT router.
>>>>                 PAT1 [192.168.0.1:45000,20.20.20.20:80] --> PAT Router
[
>>>> 10.10.10.10] -> public system [20.20.20.20:80]
>>>>                 PAT2 [192.168.0.2:45000,20.20.20.20:80] --> PAT Router
[
>>>> 10.10.10.10] -> public system [20.20.20.20:80]
>>>> The original IP of the PAT systems is NAT'ed to that of the PAT router
>and
>>>> the post-transform, inbound frames arriving at the public system stack
>are
>>>> identical for both endpoints. [10.10.10.10:4500,20.20.20.20:90]
>>>>
>>>> When testing this scenario, the first PAT system establish the TCP
>>>> connection properly. The second PAT system also connects but only a
>single
>>>> TCP connection is established on the public system. An iptables log
>seems to
>>>> indicate that the second TCP connection replaces the first.
>>>>
>>>> We have discovered that other platforms handle this scenario
>automatically
>>>> on the public system by modifying the source port on the inbound,
>>>> post-transform frame before it is sent up the stack. Thus the stack
sees
>a
>>>> unique frame for every TCP and UDP dialogue with the PAT endpoints. The
>>>> reply frame then contains the modified source port which is restored by
>the
>>>> OS to the original source port and is directed back through the
original
>>>> IPsec tunnel.
>>>>
>>>> So the questions is, can the Linux kernel do the same or something
>similar?
>>>> I looked at the xfrm routines and could not find anything that
indicates
>>>> that it could.
>>>>
>>>> Please note that using tunnel mode instead of transport mode is not an
>>>> option for our situation.
>>>>
>>>> Any help would be appreciated.
>>>> Thanks
>>>>
>>>> Steve Rajcan
>>>> mailto:Steven.Rajcan@xxxxxxxxxx

-----Original Message-----
From: Rajcan, Steven L 
Sent: Friday, July 14, 2017 2:06 PM
To: 'Noel Kuntze' <noel@xxxxxxxxxxxxxxxxx>; netfilter@xxxxxxxxxxxxxxx
Subject: RE: Distinguishing NAT(PAT) inbound frames when using IPsec
Transport mode from multiple NAT(PAT) systems

Noel,

Thanks for the response. We are already using the connmark plugin in
Strongswan. Using that plugin we are able to correctly configure and
establish IPsec SAs that map to individual PAT systems. Data transfer
through those SAs works fine except in the one case described where multiple
PAT systems attempt a TCP or UDP dialogue to a common port of a public
system with a common source port.

I tried the SNAT you suggested but modified for the environment:
	iptables -t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT
--to-source 20.20.20.10 --random
The source port was modified but it used the same source port for both
inbound frames from each PAT endpoint. So the problem remained. The second
TCP attempt succeeded but replaced the first TCP connection.

The inbound, transformed frames maintain the mark set by the connmark. This
means we can associate each inbound frame came from which PAT system. So I
tried to SNAT just the port on marked inbound frames with the following:
	iptables -t nat -A INPUT -s 20.20.20.10 -m mark ! --mark 0x00 -j
SNAT --to-source 20.20.20.10 --random
The source ports were changed but the new source port was again the same for
each frame from both PAT systems. 

I then tried to create two SNAT entries, one for each mark that was created
by the connmark plugin:
	iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x01 -j SNAT
--to-source 20.20.20.10 --random
	iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x02 -j SNAT
--to-source 20.20.20.10 --random
Again, the source ports were changed but the new source port was again the
same for each frame from both PAT systems.

Finally, I decided to create a TCP dialogue specific SNAT entry with an
assigned modified port for each PAT system. That way I knew the original
source ports from each PAT system would be mapped to unique source ports:
	iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x1
-m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5001
	iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x2
-m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5002
When I attempted the test, the first TCP connection showed the mapped source
port of 5001. However, the second TCP connection again replaced the first. I
double checked with netstat and verified that there was only one TCP
dialogue with the modified source port (5002) mapped from the second PAT
system. This does not make sense but that is what happened. 

I think the issue is that the stack does not map the restored reply frame
back to the correct IPsec SA for a particular PAT system but only in this
scenario when the original post-transform inbound frames are identical from
multiple PAT peers. 

Steve

 >I forgot to mention, that you can perform nat in INPUT and randomize the
source ports with SNAT.
>-t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT --random
>
>Check the man page for iptables-extensions for details.
>*nat INPUT isn't in the colourful graph that Jan Engelhardt made, but it
exists.
>
>On 12.07.2017 21:41, Noel Kuntze wrote:
>> Hello Steven,
>>
>> Take a look at what the connmark plugin[1] of strongSwan does.
>> I think doing the same fixes your problem. Or switch to strongSwan
>> right away and use the plugin.
>>
>> [1] https://wiki.strongswan.org/projects/strongswan/wiki/Connmark
>>
>> Kind regards
>>
>> Noel
>>
>> On 12.07.2017 21:35, Rajcan, Steven  L wrote:
>>> Hello,
>>>
>>> I have created IPsec policies using transport mode that allow systems
behind
>>> a NAT(PAT) router to connect to a public system. The issue I am having
is on
>>> a public system with established IPsec tunnels to systems behind a PAT
>>> (Port-Address-Translation) router. These routers multiplex systems
behind a
>>> single IPV4 address. The IPsec SAs are created properly and I am able to
>>> send data from these PAT system to the public system through the IPsec
>>> tunnels in most scenarios. However, it is possible that frames sent from
two
>>> or more PAT systems arrive at the public server stack with the same
source
>>> port, same source IP, same destination port, and same destination IP.
This
>>> occurs because the PAT router cannot modify the original TCP or UDP
payload
>>> encapsulated in the ESP frame. In these scenarios, the stack on the
public
>>> system gets confused and cannot map replies to those frame back through
the
>>> correct IPsec tunnel of the PAT system.
>>>
>>> Consider two PAT systems attempting a TCP connection to the same public
>>> server but each happens to use the same local port of 45000.
>>>     PAT1 system IP addr:               192.168.0.1
>>>     PAT2 system IP addr                192.168.0.2
>>>     PAT router public IP addr        20.20.20.10
>>>     Public system IP addr:              20.20.20.20
>>> Note that the original TCP frame, sent by the PATx system is
encapsulated in
>>> a UDP/ESP frame and is therefore, not modified by the PAT router.
>>>                 PAT1 [192.168.0.1:45000,20.20.20.20:80] --> PAT Router [
>>> 10.10.10.10] -> public system [20.20.20.20:80]
>>>                 PAT2 [192.168.0.2:45000,20.20.20.20:80] --> PAT Router [
>>> 10.10.10.10] -> public system [20.20.20.20:80]
>>> The original IP of the PAT systems is NAT'ed to that of the PAT router
and
>>> the post-transform, inbound frames arriving at the public system stack
are
>>> identical for both endpoints. [10.10.10.10:4500,20.20.20.20:90]
>>>
>>> When testing this scenario, the first PAT system establish the TCP
>>> connection properly. The second PAT system also connects but only a
single
>>> TCP connection is established on the public system. An iptables log
seems to
>>> indicate that the second TCP connection replaces the first.
>>>
>>> We have discovered that other platforms handle this scenario
automatically
>>> on the public system by modifying the source port on the inbound,
>>> post-transform frame before it is sent up the stack. Thus the stack sees
a
>>> unique frame for every TCP and UDP dialogue with the PAT endpoints. The
>>> reply frame then contains the modified source port which is restored by
the
>>> OS to the original source port and is directed back through the original
>>> IPsec tunnel.
>>>
>>> So the questions is, can the Linux kernel do the same or something
similar?
>>> I looked at the xfrm routines and could not find anything that indicates
>>> that it could.
>>>
>>> Please note that using tunnel mode instead of transport mode is not an
>>> option for our situation.
>>>
>>> Any help would be appreciated.
>>> Thanks
>>>
>>> Steve Rajcan
>>> mailto:Steven.Rajcan@xxxxxxxxxx

Attachment:
smime.p7s

Description: S/MIME cryptographic signature