This patch adds initial documentation for the Netfilter flowtable infrastructure. Reviewed-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> --- Documentation/networking/nf_flowtable.txt | 111 ++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 Documentation/networking/nf_flowtable.txt diff --git a/Documentation/networking/nf_flowtable.txt b/Documentation/networking/nf_flowtable.txt new file mode 100644 index 000000000000..51f09d8be1ef --- /dev/null +++ b/Documentation/networking/nf_flowtable.txt @@ -0,0 +1,111 @@ +Netfilter's flowtable infrastructure +==================================== + +This documentation describes the software flowtable infrastructure available in +Netfilter since Linux kernel 4.16. + +Overview +-------- + +Initial packets follow the classic forwarding path, once the flow enters the +established state according to the conntrack semantics (ie. we have seen traffic +in both directions), then you can decide to offload the flow to the flowtable +from the forward chain via the 'flow offload' action available in nftables. + +Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the +destination netdevice via neigh_xmit(), hence, they bypass the classic +forwarding path (the visible effect is that you do not see these packets from +any of the netfilter hooks coming after the ingress). In case of flowtable miss, +the packet follows the classic forward path. + +The flowtable uses a resizable hashtable, lookups are based on the following +7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source +and destination ports and the input interface (useful in case there are several +conntrack zones in place). + +Flowtables are populated via the 'flow offload' nftables action, so the user can +selectively specify what flows are placed into the flow table. Hence, packets +follow the classic forwarding path unless the user explicitly instruct packets +to use this new alternative forwarding path via nftables policy. + +This is represented in Fig.1, which describes the classic forwarding +path including the Netfilter hooks and the flowtable fastpath bypass. + + userspace process + ^ | + | | + _____|____ ____\/___ + / \ / \ + | input | | output | + \__________/ \_________/ + ^ | + | | + _________ __________ --------- _____\/_____ + / \ / \ |Routing | / \ + --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit + \_________/ \__________/ ---------- \____________/ ^ + | ^ | | ^ | + flowtable | | ____\/___ | | + | | | / \ | | + __\/___ | --------->| forward |------------ | + |-----| | \_________/ | + |-----| | 'flow offload' rule | + |-----| | | + |_____| | | + | | | + / \ | | + /hit\_no_| | + \ ? / | + \ / | + |__yes_________________fastpath bypass ____________________________| + + Fig.1 Netfilter hooks and flowtable interactions + +The flowtable entry stores the NAT configuration, so all packets are mangled +according to the NAT policy that is matching with the initial packets that went +through the classic forwarding path. The TTL is decremented before calling +neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding +path given that the transport selectors are missing. + +Example configuration +--------------------- + +Enabling the flowtable bypass is relatively easy, you only need to create a +flowtable and add one rule to your forward chain. + + table inet x { + flowtable f { + hook ingress priority 0 devices = { eth0, eth1 }; + } + chain y { + type filter hook forward priority 0; policy accept; + ip protocol tcp flow offload @f + counter + } + } + +This example adds the flowtable 'f' that in registered in the ingress hook in +the eth0 and eth1 netdevices. You can create as many flowtables as you want for +resource partitioning. The priority define the order in which hooks are run in +the pipeline, this is convenient in case you already have a nftables ingress +chain, make sure the flowtable priority is smaller than the nftables ingress +chain hence the flowtable runs before in the pipeline. + +The 'flow offload' action from the forward chain 'y' adds an entry to the +flowtable for the TCP syn-ack packet coming in the reply direction. Once the +flow is offloaded, you will observe that the counter rule does not get updated +for the packets that are being forwarded since those bypass the classic +forwarding path. + +More reading +------------ + +This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also +made a very complete and comprehensive summary called "A state of network +acceleration" that describes how things were before this infrastructure was +mailined [3] and it also makes a rough summary of this work [4]. + +[1] https://lwn.net/Articles/738214/ +[2] https://lwn.net/Articles/742164/ +[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html +[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html