From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Date: Fri, 20 Nov 2020 13:49:12 +0100 > Hi, > > The following patchset augments the Netfilter flowtable fastpath to > support for network topologies that combine IP forwarding, bridge and > VLAN devices. > > This v5 includes updates for: > > - Patch #2: fix incorrect xmit type in IPv6 path, per Florian Westphal. > - Patch #3: fix possible off by one in dev_fill_forward_path() stack logic, > per Florian Westphal. > - Patch #7: add a note to patch description to specify that FDB topology > updates are not supported at this stage, per Jakub Kicinski. > > A typical scenario that can benefit from this infrastructure is composed > of several VMs connected to bridge ports where the bridge master device > 'br0' has an IP address. A DHCP server is also assumed to be running to > provide connectivity to the VMs. The VMs reach the Internet through > 'br0' as default gateway, which makes the packet enter the IP forwarding > path. Then, netfilter is used to NAT the packets before they leave > through the wan device. > > Something like this: > > fast path > .------------------------. > / \ > | IP forwarding | > | / \ . > | br0 eth0 > . / \ > -- veth1 veth2 > . > . > . > eth0 > ab:cd:ef:ab:cd:ef > VM I'm concerned about bypassing vlan and bridge's .ndo_start_xmit() in case of this shortcut. We'll have incomplete netdevice Tx stats for these two, as it gets updated inside this callbacks. > The idea is to accelerate forwarding by building a fast path that takes > packets from the ingress path of the bridge port and place them in the > egress path of the wan device (and vice versa). Hence, skipping the > classic bridge and IP stack paths. > > This patchset is composed of: > > Patch #1 adds a placeholder for the hash calculation, instead of using > the dir field. > > Patch #2 adds the transmit path type field to the flow tuple. Two transmit > paths are supported so far: the neighbour and the xfrm transmit > paths. This patch comes in preparation to add a new direct ethernet > transmit path (see patch #7). > > Patch #3 adds dev_fill_forward_path() and .ndo_fill_forward_path() to > netdev_ops. This new function describes the list of netdevice hops > to reach a given destination MAC address in the local network topology, > e.g. > > IP forwarding > / \ > br0 eth0 > / \ > veth1 veth2 > . > . > . > eth0 > ab:cd:ef:ab:cd:ef > > where veth1 and veth2 are bridge ports and eth0 provides Internet > connectivity. eth0 is the interface in the VM which is connected to > the veth1 bridge port. Then, for packets going to br0 whose > destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() > provides the following path: br0 -> veth1. > > Patch #4 adds .ndo_fill_forward_path for VLAN devices, which provides the next > device hop via vlan->real_dev. This annotates the VLAN id and protocol. > This is useful to know what VLAN headers are expected from the ingress > device. This also provides information regarding the VLAN headers > to be pushed in the egress path. > > Patch #5 adds .ndo_fill_forward_path for bridge devices, which allows to make > lookups to the FDB to locate the next device hop (bridge port) in the > forwarding path. > > Patch #6 updates the flowtable to use the dev_fill_forward_path() > infrastructure to obtain the ingress device in the fastpath. > > Patch #7 updates the flowtable to use dev_fill_forward_path() to obtain the > egress device in the forwarding path. This also adds the direct > ethernet transmit path, which pushes the ethernet header to the > packet and send it through dev_queue_xmit(). This patch adds > support for the bridge, so bridge ports use this direct xmit path. > > Patch #8 adds ingress VLAN support (up to 2 VLAN tags, QinQ). The VLAN > information is also provided by dev_fill_forward_path(). Store the > VLAN id and protocol in the flow tuple for hash lookups. The VLAN > support in the xmit path is achieved by annotating the first vlan > device found in the xmit path and by calling dev_hard_header() > (previous patch #7) before dev_queue_xmit(). > > Patch #9 extends nft_flowtable.sh selftest: This is adding a test to > cover bridge and vlan support coming in this patchset. > > = Performance numbers > > My testbed environment consists of three containers: > > 192.168.20.2 .20.1 .10.1 10.141.10.2 > veth0 veth0 veth1 veth0 > ns1 <---------> nsr1 <--------> ns2 > SNAT > iperf -c iperf -s > > where nsr1 is used for forwarding. There is a bridge device br0 in nsr1, > veth0 is a port of br0. SNAT is performed on the veth1 device of nsr1. > > - ns2 runs iperf -s > - ns1 runs iperf -c 10.141.10.2 -n 100G > > My results are: > > - Baseline (no flowtable, classic forwarding path + netfilter): ~16 Gbit/s > - Fastpath (with flowtable, this patchset): ~25 Gbit/s > > This is an improvement of ~50% compared to baseline. > > Please, apply. Thank you. > > Pablo Neira Ayuso (9): > netfilter: flowtable: add hash offset field to tuple > netfilter: flowtable: add xmit path types > net: resolve forwarding path from virtual netdevice and HW destination address > net: 8021q: resolve forwarding path for vlan devices > bridge: resolve forwarding path for bridge devices > netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device > netfilter: flowtable: use dev_fill_forward_path() to obtain egress device > netfilter: flowtable: add vlan support > selftests: netfilter: flowtable bridge and VLAN support > > include/linux/netdevice.h | 35 +++ > include/net/netfilter/nf_flow_table.h | 43 +++- > net/8021q/vlan_dev.c | 15 ++ > net/bridge/br_device.c | 27 +++ > net/core/dev.c | 46 ++++ > net/netfilter/nf_flow_table_core.c | 51 +++-- > net/netfilter/nf_flow_table_ip.c | 200 ++++++++++++++---- > net/netfilter/nft_flow_offload.c | 159 +++++++++++++- > .../selftests/netfilter/nft_flowtable.sh | 82 +++++++ > 9 files changed, 598 insertions(+), 60 deletions(-) > > -- > 2.20.1 Al