Re: [PATCH nf-next v4 4/5] netfilter: Introduce egress hook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/22/21 9:47 AM, Lukas Wunner wrote:
Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets with netfilter on ingress.

Support the same on egress to satisfy user requirements such as:
* outbound security policies for containers (Laura)
* filtering and mangling intra-node Direct Server Return (DSR) traffic
   on a load balancer (Laura)
* filtering locally generated traffic coming in through AF_PACKET,
   such as local ARP traffic generated for clustering purposes or DHCP
   (Laura; the AF_PACKET plumbing is contained in a separate commit)
* L2 filtering from ingress and egress for AVB (Audio Video Bridging)
   and gPTP with nftables (Pablo)
* in the future: in-kernel NAT64/NAT46 (Pablo)

A patch for nftables to hook up egress rules from user space has been
submitted separately, so users may immediately take advantage of the
feature.

The hook is positioned after packet handling by traffic control.
Thus, if packets are redirected into and out of containers with tc,
the data path is:
ingress: host tc -> container tc -> container nft
egress:  container tc -> host tc -> host nft

This was done to address an objection from Daniel Borkmann:  If desired,
nft does not get into tc's way performance-wise.  The host is able to
firewall malicious packets coming out of a container, but only after tc
has done its duty.  An implication is that tc may set skb->mark on
egress for nft to act on it, but not the other way round.

If egress netfilter handling is not enabled on any interface, it is
patched out of the data path by way of a static_key and doesn't make a
performance difference that is discernible from noise:

[...]
@@ -4096,13 +4098,18 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
  	qdisc_pkt_len_init(skb);
  #ifdef CONFIG_NET_CLS_ACT
  	skb->tc_at_ingress = 0;
-# ifdef CONFIG_NET_EGRESS
+#endif
+#ifdef CONFIG_NET_EGRESS
  	if (static_branch_unlikely(&egress_needed_key)) {
  		skb = sch_handle_egress(skb, &rc, dev);
  		if (!skb)
  			goto out;
+		if (nf_hook_egress_active()) {
+			skb = nf_hook_egress(skb, &rc, dev);
+			if (!skb)
+				goto out;
+		}

This won't work unfortunately, the issue is that this now creates an asymmetric path, for
example: [tc ingress] -> [nf ingress] -> [host] -> [tc egress] -> [nf egress]. So any NAT
translation done on tc ingress + tc egress will break on the nf hooks given nf is not able to
correlate inbound with outbound traffic. Likewise for container-based traffic that is in its
own netns, one of the existing paths is: [tc ingress (phys,hostns)] -> [tc ingress (veth,podns)] ->
[reply] -> [tc egress (veth,podns)] -> [tc ingress (veth,hostns)] -> [nf egress (phys,hostns)*] ->
[tc egress (phys,hostns)]. As can be seen the [nf ingress] hook is not an issue at all given
everything operates in tc layer but the [nf egress*] one is in this case, as it touches in tc
layer where existing data planes will start to break upon rule injection. Wrt above objection,
what needs to be done for the minimum case is to i) fix the asymmetry problem from here, and
ii) add a flag for tc layer-redirected skbs to skip the nf egress hook *; with that in place
this concern should be resolved. Thanks!

  	}
-# endif
  #endif
  	/* If device/qdisc don't need skb->dst, release it right now while
  	 * its hot in this cpu cache.



[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux