[Resending as plain text] > On Nov 9, 2015, at 5:31 AM, Patrick McHardy <kaber@xxxxxxxxx> wrote: > > On 06.11, Jarno Rajahalme wrote: >> This series adds NAT support to openvswitch kernel module. A few >> changes are needed to the netfilter code to facilitate this (patches >> 1-3/8). Patches 4-7 make the openvswitch kernel module ready for the >> patch 8 that adds the NAT support for calling into netfilter NAT code >> from the openvswitch conntrack action. > > I'm missing some high level description, especially how it is invoked, how > it makes sure expectations of the NAT code about its invocation are met > (it is my understanding that OVS simply invokes this based on actions > specified by the user) and how it interacts with the remaining netfilter > features. > The corresponding OVS userspace patches contain the new test cases for the NAT features (http://openvswitch.org/pipermail/dev/2015-November/061920.html) in tests/system-traffic.at. I’ll walk through two of them below. Test case: conntrack - simple SNAT In these tests ports 1 and 2 are in different namespaces. The flow table below allows all IPv4 traffic between port 1 to port 2, but IP connections from port 1 to port 2 are source NATted: in_port=1,ip,action=ct(commit,zone=1,nat(src=10.1.1.240-10.1.1.255)),2 in_port=2,ct_state=-trk,ip,action=ct(table=0,zone=1,nat) in_port=2,ct_state=+trk,ct_zone=1,ip,action=1 This flow table matches all IPv4 traffic from port 1, runs them through conntrack in zone 1 and NATs them. The NAT is initialized to do source IP mapping to the given range for the first packet of each connection, after which the new connection is committed. For further packets of already tracked connections NAT is done according to the connection state and the commit is a no-op. Each packet that is not flagged as a drop by the CT action is forwarded to port 2. The CT action does an implicit fragmentation reassembly, so that only complete packets are run through conntrack. Reassembled packets are re-fragmented on output. The IPv4 traffic coming from port 2 is first matched for the non-tracked state (-trk), which means that the packet has not been through a CT action yet. Such traffic is run trough the conntrack in zone 1 and all packets associated with a NATted connection are NATted also in the return direction. After the packet has been through conntrack it is recirculated back to table 0 (which is the default table, so all the rules above are in table 0). The CT action changes the ‘trk’ flag to being set, so the packets after recirculation match the third rule (+trk), and the packet is output on port 1. Since also the ct_zone is matched, only packets that were actually tracked by conntrack can match the 3rd rule and output to 1. I’m skipped the rules for ARP handling in this walkthrough, but the test case has rules to match on the ARP request on the NATted address and reply to it. The above is verified with a HTTP request and a subsequent conntrack entry listing in the test case: dnl HTTP requests from p0->p1 should work fine. NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid]) NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 5 -T 1 --retry-connrefused -v -o wget0.log]) AT_CHECK([conntrack -L 2>&1 | FORMAT_CT(10.1.1.2) | sed -e 's/dst=10.1.1.2[[45]][[0-9]]/dst=10.1.1.2XX/'], [0], [dnl TIME_WAIT src=10.1.1.1 dst=10.1.1.2 sport=<cleared> dport=<cleared> src=10.1.1.2 dst=10.1.1.2XX sport=<cleared> dport=<cleared> [[ASSURED]] mark=0 zone=1 use=1 ]) As a second example, I’ll walk through a test case of NAT with FTP on IPv6: As before, ports 1 (p0) and 2 (p1) reside in different namespaces (‘at_ns0’ and ‘at_ns1’, respectively). A static neighbor cache entry for the NATted address is created in at_ns1. In a more realistic scenario a controller would need to implement a ND proxy instead: ADD_VETH(p0, at_ns0, br0, "fc00::1/96") NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88]) ADD_VETH(p1, at_ns1, br0, "fc00::2/96") dnl Would be nice if NAT could translate neighbor discovery messages, too. NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::240 lladdr 80:88:88:88:88:88 dev p1]) The OpenFlow rules are split into two tables (table 0 and table 1). Table 0 tracks all IPv6 traffic and drops all other traffic. Packets are recirculated to table 1 after conntrack. Existing connections are NATted before recirculation: table=0 priority=10 ip6, action=ct(nat,table=1) table=0 priority=0 action=drop Table 1: FTP control connections from the “private” address to port 21 are output to port 1 after being tracked, NATted, and committed (== confirmed). table=1 in_port=1 ct_state=+new tcp6 ipv6_src=fc00::1 tp_dst=21 action=ct(alg=ftp,commit,nat(src=fc00::240)),2 The matches on the source/dest IP address below are not really needed, but in the test case they are to make sure the NATting actually happens. Related TCP connections to the reverse direction are NATted, committed and output to port 1: table=1 in_port=2 ct_state=+new+rel tcp6 ipv6_dst=fc00::240 action=ct(commit,nat),1 Packets on established TCP connections are allowed both ways (the CT action in table 0 NATted these before recirculation, so we see the mapped address going from port 1 to port 2, and the reverse-mapped (== original) address on return traffic): table=1 in_port=1 ct_state=+est tcp6 ipv6_src=fc00::240 action=2 table=1 in_port=2 ct_state=+est tcp6 ipv6_dst=fc00::1 action=1 We pass icmp6 packets both ways: table=1 priority=100 in_port=1 icmp6, action=2 table=1 priority=100 in_port=2 icmp6, action=1 Everything else is dropped: table=1 priority=0, action=drop Functionality is verified with a Python FTP server in at_ns1 and wget to it from at_ns0: NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py ftp]], [ftp0.pid]) dnl FTP requests from p0->p1 should work fine. NS_CHECK_EXEC([at_ns0], [wget ftp://[[fc00::2 ]] -6 --no-passive-ftp -t 3 -T 1 --retry-connrefused -v --server-response --no-proxy --no-remove-listing -o wget0.log -d]) Both the control connection and the data connection were NATted: AT_CHECK([conntrack -L -f ipv6 2>&1 | FORMAT_CT(fc00::2) | grep -v "FIN" | grep -v "CLOSE"], [0], [dnl TIME_WAIT src=fc00::1 dst=fc00::2 sport=<cleared> dport=<cleared> src=fc00::2 dst=fc00::240 sport=<cleared> dport=<cleared> [[ASSURED]] mark=0 helper=ftp use=2 TIME_WAIT src=fc00::2 dst=fc00::240 sport=<cleared> dport=<cleared> src=fc00::1 dst=fc00::2 sport=<cleared> dport=<cleared> [[ASSURED]] mark=0 use=1 ]) The flow tables in the test cases are OpenFlow tables, which are translated to kernel flows by ovs-vswitchd. I have included the relevant kernel datapath flows generated in the IPv6 NAT test case above below for your reference. Please note that the kernel port numbers are different from the OpenFlow port numbers, as each bridge has it’s own range of OpenFlow port numbers, but they all share the same kernel datapath. In this case OpenFlow port 1 is kernel port 2, and OpenFlow port 2 is kernel port 3. Kernel flows are all in the same table and they have no priorities (so they should have mutually exclusive matches). Recirculation IDs are allocated on-demand, here the recirculation ID 1 deals with traffic to the original direction and 2 to the return direction (recirc_id(0) designates no recirculation). ovs_vswitchd always exact matches IP frag, so since the test traffic had no fragments, you’ll see (frag=no) in each flow below. Also, the packet that causes the flow to be created is executed independently (but with the same actions) of the newly created flow, so the kernel datapath flow counts do not reflect the first packet (OpenFlow counters do), this explains 0 counts on some flows below: recirc_id(0),in_port(2),eth_type(0x86dd),ipv6(frag=no), packets:28, bytes:2541, used:0.048s, flags:SFPR., actions:ct(nat),recirc(0x1) recirc_id(0x1),in_port(2),ct_state(+new),eth_type(0x86dd),ipv6(src=fc00::1,proto=6,frag=no),tcp(dst=21), packets:1, bytes:94, used:0.072s, flags:S, actions:ct(commit,helper=ftp,nat(src=fc00::240)),3 recirc_id(0x1),in_port(2),ct_state(-new+est+rel),eth_type(0x86dd),ipv6(src=fc00::240,proto=6,frag=no), packets:4, bytes:344, used:0.048s, flags:F., actions:3 recirc_id(0x1),in_port(2),ct_state(-new+est-rel),eth_type(0x86dd),ipv6(src=fc00::240,proto=6,frag=no), packets:12, bytes:1145, used:0.048s, flags:FP., actions:3 recirc_id(0),in_port(3),eth_type(0x86dd),ipv6(frag=no), packets:26, bytes:4939, used:0.048s, flags:SFP., actions:ct(nat),recirc(0x2) recirc_id(0x2),in_port(3),ct_state(+new+rel),eth_type(0x86dd),ipv6(dst=fc00::240,proto=6,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,nat),2 recirc_id(0x2),in_port(3),ct_state(-new+est-rel),eth_type(0x86dd),ipv6(dst=fc00::1,proto=6,frag=no), packets:12, bytes:1323, used:0.048s, flags:SFP., actions:2 recirc_id(0x2),in_port(3),ct_state(-new+est+rel),eth_type(0x86dd),ipv6(dst=fc00::1,proto=6,frag=no), packets:3, bytes:1113, used:0.048s, flags:FP., actions:2 The generated matches depend on the order of processing of rules in the OpenFlow tables, which can be influenced by specifying rule priorities to OpenFlow rules. For example, with a different set of priorities the datapath rules become (In this test run the first IP (ICMPv6) packet was received from at_ns1, so the recirc_ids are reversed from above, ICMPv6 flows not included here.): recirc_id(0),in_port(2),eth_type(0x86dd),ipv6(frag=no), packets:27, bytes:2455, used:0.040s, flags:SFPR., actions:ct(nat),recirc(0x2) recirc_id(0x2),in_port(2),ct_state(+est),eth_type(0x86dd),ipv6(src=fc00::240,proto=6,frag=no), packets:16, bytes:1497, used:0.040s, flags:SFP., actions:3 recirc_id(0x2),in_port(2),ct_state(+new-est),eth_type(0x86dd),ipv6(src=fc00::1,proto=6,frag=no),tcp(dst=21), packets:1, bytes:94, used:0.108s, flags:S, actions:ct(commit,helper=ftp,nat(src=fc00::240)),3 recirc_id(0),in_port(3),eth_type(0x86dd),ipv6(frag=no), packets:28, bytes:5111, used:0.040s, flags:SFP., actions:ct(nat),recirc(0x1) recirc_id(0x1),in_port(3),ct_state(+est),eth_type(0x86dd),ipv6(dst=fc00::1,proto=6,frag=no), packets:19, bytes:4281, used:0.040s, flags:SFP., actions:2 recirc_id(0x1),in_port(3),ct_state(+new-est+rel),eth_type(0x86dd),ipv6(dst=fc00::240,proto=6,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,nat),2 Hope this helps, Jarno >> Jarno Rajahalme (8): >> netfilter: Remove IP_CT_NEW_REPLY definition. >> netfilter: Factor out nf_ct_get_info(). >> netfilter: Allow calling into nat helper without skb_dst. >> openvswitch: Update the CT state key only after nf_conntrack_in(). >> openvswitch: Find existing conntrack entry after upcall. >> openvswitch: Handle NF_REPEAT in conntrack action. >> openvswitch: Delay conntrack helper call for new connections. >> openvswitch: Interface with NAT. >> >> include/net/netfilter/nf_conntrack.h | 15 + >> include/uapi/linux/netfilter/nf_conntrack_common.h | 12 +- >> include/uapi/linux/openvswitch.h | 47 ++ >> net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 29 +- >> net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 29 +- >> net/netfilter/nf_conntrack_core.c | 22 +- >> net/openvswitch/conntrack.c | 632 +++++++++++++++++++-- >> net/openvswitch/conntrack.h | 3 +- >> 8 files changed, 686 insertions(+), 103 deletions(-) >> >> -- >> 2.1.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html