I think part of the problem is that the conntrack entries are being overwritten when the second TCP dialogue is initiated only in this scenario. I retried a test with two PAT systems (192.168.0.1 and 192.168.0.2) both initiating a TCP dialogue with source port 5000 to port 9000 of a public system (20.20.20.20). Both established an IPsec SA with a unique mark. I added following SNAT rules to try to force each PAT connection to modify the source port to a specified port: iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x01 -p tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5011 iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x02 -p tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5012 I attempted the first TCP dialogue and saw this conntrack entry: tcp 6 431991 ESTABLISHED src=20.20.20.10 dst=20.20.20.20 sport=5000 dport=9000 src=20.20.20.10 dst=20.20.20.20 sport=9000 dport=5011 [ASSURED] mark=1 use=1 I attempted the second dialogue and saw this conntrack entry but the previous one was removed: tcp 6 431999 ESTABLISHED src=20.20.20.10 dst=20.20.20.20 sport=5000 dport=9000 src=20.20.20.10 dst=20.20.20.20 sport=9000 dport=5012 [ASSURED] mark=2 use=1 I had inserted a LOG in the mangle OUPUT chain and saw that after the second dialogue was initiated, a single TCP response frame was sent to the first PAT system with the modified source port (5011) and mark (0x02) mismatched: kernel: [ 6484.314200] IN= OUT=eth0 SRC=20.20.20.20 DST=20.20.20.10 LEN=64 TOS=0x00 PREC=0x00 TTL=64 ID=15222 DF PROTO=TCP SPT=9000 DPT=5011 WINDOW=227 RES=0x00 ACK URGP=0 MARK=0x2 Subsequent frames back to the first PAT system showed no mark and no modified source port kernel: [ 6503.198838] IN= OUT=eth0 SRC=20.20.20.20 DST=20.20.20.10 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=59925 DF PROTO=TCP SPT=9000 DPT=5000 WINDOW=0 RES=0x00 RST URGP=0 TCP responses to the second dialogue were appropriate. kernel: [ 5680.324821] IN= OUT=eth0 SRC=20.20.20.20 DST=20.20.20.10 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=32252 DF PROTO=TCP SPT=9000 DPT=5012 WINDOW=227 RES=0x00 ACK URGP=0 MARK=0x2 >Noel, > >Thanks for the response. We are already using the connmark plugin in >Strongswan. Using that plugin we are able to correctly configure and >establish IPsec SAs that map to individual PAT systems. Data transfer >through those SAs works fine except in the one case described where multiple >PAT systems attempt a TCP or UDP dialogue to a common port of a public >system with a common source port. > >I tried the SNAT you suggested but modified for the environment: > iptables -t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT >--to-source 20.20.20.10 --random >The source port was modified but it used the same source port for both >inbound frames from each PAT endpoint. So the problem remained. The second >TCP attempt succeeded but replaced the first TCP connection. > >The inbound, transformed frames maintain the mark set by the connmark. This >means we can associate each inbound frame came from which PAT system. So I >tried to SNAT just the port on marked inbound frames with the following: > iptables -t nat -A INPUT -s 20.20.20.10 -m mark ! --mark 0x00 -j >SNAT --to-source 20.20.20.10 --random >The source ports were changed but the new source port was again the same for >each frame from both PAT systems. > >I then tried to create two SNAT entries, one for each mark that was created >by the connmark plugin: > iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x01 -j SNAT >--to-source 20.20.20.10 --random > iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x02 -j SNAT >--to-source 20.20.20.10 --random >Again, the source ports were changed but the new source port was again the >same for each frame from both PAT systems. > >Finally, I decided to create a TCP dialogue specific SNAT entry with an >assigned modified port for each PAT system. That way I knew the original >source ports from each PAT system would be mapped to unique source ports: > iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x1 >-m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5001 > iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x2 >-m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5002 >When I attempted the test, the first TCP connection showed the mapped source >port of 5001. However, the second TCP connection again replaced the first. I >double checked with netstat and verified that there was only one TCP >dialogue with the modified source port (5002) mapped from the second PAT >system. This does not make sense but that is what happened. > >I think the issue is that the stack does not map the restored reply frame >back to the correct IPsec SA for a particular PAT system but only in this >scenario when the original post-transform inbound frames are identical from >multiple PAT peers. > >Steve > > >I forgot to mention, that you can perform nat in INPUT and randomize the >source ports with SNAT. >>-t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT --random >> >>Check the man page for iptables-extensions for details. >>*nat INPUT isn't in the colourful graph that Jan Engelhardt made, but it >exists. >> >>On 12.07.2017 21:41, Noel Kuntze wrote: >>> Hello Steven, >>> >>> Take a look at what the connmark plugin[1] of strongSwan does. >>> I think doing the same fixes your problem. Or switch to strongSwan >>> right away and use the plugin. >>> >>> [1] https://wiki.strongswan.org/projects/strongswan/wiki/Connmark >>> >>> Kind regards >>> >>> Noel >>> >>> On 12.07.2017 21:35, Rajcan, Steven L wrote: >>>> Hello, >>>> >>>> I have created IPsec policies using transport mode that allow systems >behind >>>> a NAT(PAT) router to connect to a public system. The issue I am having >is on >>>> a public system with established IPsec tunnels to systems behind a PAT >>>> (Port-Address-Translation) router. These routers multiplex systems >behind a >>>> single IPV4 address. The IPsec SAs are created properly and I am able to >>>> send data from these PAT system to the public system through the IPsec >>>> tunnels in most scenarios. However, it is possible that frames sent from >two >>>> or more PAT systems arrive at the public server stack with the same >source >>>> port, same source IP, same destination port, and same destination IP. >This >>>> occurs because the PAT router cannot modify the original TCP or UDP >payload >>>> encapsulated in the ESP frame. In these scenarios, the stack on the >public >>>> system gets confused and cannot map replies to those frame back through >the >>>> correct IPsec tunnel of the PAT system. >>>> >>>> Consider two PAT systems attempting a TCP connection to the same public >>>> server but each happens to use the same local port of 45000. >>>> PAT1 system IP addr: 192.168.0.1 >>>> PAT2 system IP addr 192.168.0.2 >>>> PAT router public IP addr 20.20.20.10 >>>> Public system IP addr: 20.20.20.20 >>>> Note that the original TCP frame, sent by the PATx system is >encapsulated in >>>> a UDP/ESP frame and is therefore, not modified by the PAT router. >>>> PAT1 [192.168.0.1:45000,20.20.20.20:80] --> PAT Router [ >>>> 10.10.10.10] -> public system [20.20.20.20:80] >>>> PAT2 [192.168.0.2:45000,20.20.20.20:80] --> PAT Router [ >>>> 10.10.10.10] -> public system [20.20.20.20:80] >>>> The original IP of the PAT systems is NAT'ed to that of the PAT router >and >>>> the post-transform, inbound frames arriving at the public system stack >are >>>> identical for both endpoints. [10.10.10.10:4500,20.20.20.20:90] >>>> >>>> When testing this scenario, the first PAT system establish the TCP >>>> connection properly. The second PAT system also connects but only a >single >>>> TCP connection is established on the public system. An iptables log >seems to >>>> indicate that the second TCP connection replaces the first. >>>> >>>> We have discovered that other platforms handle this scenario >automatically >>>> on the public system by modifying the source port on the inbound, >>>> post-transform frame before it is sent up the stack. Thus the stack sees >a >>>> unique frame for every TCP and UDP dialogue with the PAT endpoints. The >>>> reply frame then contains the modified source port which is restored by >the >>>> OS to the original source port and is directed back through the original >>>> IPsec tunnel. >>>> >>>> So the questions is, can the Linux kernel do the same or something >similar? >>>> I looked at the xfrm routines and could not find anything that indicates >>>> that it could. >>>> >>>> Please note that using tunnel mode instead of transport mode is not an >>>> option for our situation. >>>> >>>> Any help would be appreciated. >>>> Thanks >>>> >>>> Steve Rajcan >>>> mailto:Steven.Rajcan@xxxxxxxxxx -----Original Message----- From: Rajcan, Steven L Sent: Friday, July 14, 2017 2:06 PM To: 'Noel Kuntze' <noel@xxxxxxxxxxxxxxxxx>; netfilter@xxxxxxxxxxxxxxx Subject: RE: Distinguishing NAT(PAT) inbound frames when using IPsec Transport mode from multiple NAT(PAT) systems Noel, Thanks for the response. We are already using the connmark plugin in Strongswan. Using that plugin we are able to correctly configure and establish IPsec SAs that map to individual PAT systems. Data transfer through those SAs works fine except in the one case described where multiple PAT systems attempt a TCP or UDP dialogue to a common port of a public system with a common source port. I tried the SNAT you suggested but modified for the environment: iptables -t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT --to-source 20.20.20.10 --random The source port was modified but it used the same source port for both inbound frames from each PAT endpoint. So the problem remained. The second TCP attempt succeeded but replaced the first TCP connection. The inbound, transformed frames maintain the mark set by the connmark. This means we can associate each inbound frame came from which PAT system. So I tried to SNAT just the port on marked inbound frames with the following: iptables -t nat -A INPUT -s 20.20.20.10 -m mark ! --mark 0x00 -j SNAT --to-source 20.20.20.10 --random The source ports were changed but the new source port was again the same for each frame from both PAT systems. I then tried to create two SNAT entries, one for each mark that was created by the connmark plugin: iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x01 -j SNAT --to-source 20.20.20.10 --random iptables -t nat -A INPUT -s 20.20.20.10 -m mark --mark 0x02 -j SNAT --to-source 20.20.20.10 --random Again, the source ports were changed but the new source port was again the same for each frame from both PAT systems. Finally, I decided to create a TCP dialogue specific SNAT entry with an assigned modified port for each PAT system. That way I knew the original source ports from each PAT system would be mapped to unique source ports: iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x1 -m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5001 iptables -t nat -A INPUT -s 20.20.20.10/32 -p tcp -m mark --mark 0x2 -m tcp --sport 5000 -j SNAT --to-source 20.20.20.10:5002 When I attempted the test, the first TCP connection showed the mapped source port of 5001. However, the second TCP connection again replaced the first. I double checked with netstat and verified that there was only one TCP dialogue with the modified source port (5002) mapped from the second PAT system. This does not make sense but that is what happened. I think the issue is that the stack does not map the restored reply frame back to the correct IPsec SA for a particular PAT system but only in this scenario when the original post-transform inbound frames are identical from multiple PAT peers. Steve >I forgot to mention, that you can perform nat in INPUT and randomize the source ports with SNAT. >-t nat -A INPUT -m policy --pol ipsec --dir in -j SNAT --random > >Check the man page for iptables-extensions for details. >*nat INPUT isn't in the colourful graph that Jan Engelhardt made, but it exists. > >On 12.07.2017 21:41, Noel Kuntze wrote: >> Hello Steven, >> >> Take a look at what the connmark plugin[1] of strongSwan does. >> I think doing the same fixes your problem. Or switch to strongSwan >> right away and use the plugin. >> >> [1] https://wiki.strongswan.org/projects/strongswan/wiki/Connmark >> >> Kind regards >> >> Noel >> >> On 12.07.2017 21:35, Rajcan, Steven L wrote: >>> Hello, >>> >>> I have created IPsec policies using transport mode that allow systems behind >>> a NAT(PAT) router to connect to a public system. The issue I am having is on >>> a public system with established IPsec tunnels to systems behind a PAT >>> (Port-Address-Translation) router. These routers multiplex systems behind a >>> single IPV4 address. The IPsec SAs are created properly and I am able to >>> send data from these PAT system to the public system through the IPsec >>> tunnels in most scenarios. However, it is possible that frames sent from two >>> or more PAT systems arrive at the public server stack with the same source >>> port, same source IP, same destination port, and same destination IP. This >>> occurs because the PAT router cannot modify the original TCP or UDP payload >>> encapsulated in the ESP frame. In these scenarios, the stack on the public >>> system gets confused and cannot map replies to those frame back through the >>> correct IPsec tunnel of the PAT system. >>> >>> Consider two PAT systems attempting a TCP connection to the same public >>> server but each happens to use the same local port of 45000. >>> PAT1 system IP addr: 192.168.0.1 >>> PAT2 system IP addr 192.168.0.2 >>> PAT router public IP addr 20.20.20.10 >>> Public system IP addr: 20.20.20.20 >>> Note that the original TCP frame, sent by the PATx system is encapsulated in >>> a UDP/ESP frame and is therefore, not modified by the PAT router. >>> PAT1 [192.168.0.1:45000,20.20.20.20:80] --> PAT Router [ >>> 10.10.10.10] -> public system [20.20.20.20:80] >>> PAT2 [192.168.0.2:45000,20.20.20.20:80] --> PAT Router [ >>> 10.10.10.10] -> public system [20.20.20.20:80] >>> The original IP of the PAT systems is NAT'ed to that of the PAT router and >>> the post-transform, inbound frames arriving at the public system stack are >>> identical for both endpoints. [10.10.10.10:4500,20.20.20.20:90] >>> >>> When testing this scenario, the first PAT system establish the TCP >>> connection properly. The second PAT system also connects but only a single >>> TCP connection is established on the public system. An iptables log seems to >>> indicate that the second TCP connection replaces the first. >>> >>> We have discovered that other platforms handle this scenario automatically >>> on the public system by modifying the source port on the inbound, >>> post-transform frame before it is sent up the stack. Thus the stack sees a >>> unique frame for every TCP and UDP dialogue with the PAT endpoints. The >>> reply frame then contains the modified source port which is restored by the >>> OS to the original source port and is directed back through the original >>> IPsec tunnel. >>> >>> So the questions is, can the Linux kernel do the same or something similar? >>> I looked at the xfrm routines and could not find anything that indicates >>> that it could. >>> >>> Please note that using tunnel mode instead of transport mode is not an >>> option for our situation. >>> >>> Any help would be appreciated. >>> Thanks >>> >>> Steve Rajcan >>> mailto:Steven.Rajcan@xxxxxxxxxx
Attachment:
smime.p7s
Description: S/MIME cryptographic signature