Re: Most optimal method to dump UDP conntrack entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Antonio Ojea <antonio.ojea.garcia@xxxxxxxxx> wrote:
> We are not worried about being slow in the order of seconds, the
> system is eventually consistent so there can always be a reasonable
> latency.
> Since we only care about UDP, losing packets during that period is not
> desirable but is assumable.
> My main concern is if constantly dumping all the entries via netlink
> can cause any issue or increase resources consumption.

It doesn't need lots of extra memory on kernel side, just for the
some internal tracking ("where were we") and temporary netlink
buffer to assemble the dump chunks returned to userspace.

> > > Also both proc and netlink dumps can miss entries (albeit its rare),
> > > if parallel insertions/deletes happen (which is normal on busy system).
> 
> That is one of the reasons we want to implement this reconcile loop,
> so it can be resilient to this kind of errors, we keep the state on
> the API in the control plane, so we can always rebuild the state in
> the dataplane (recreating nftables rules and delete conntrack entries
> that does not match the current state)

OK, but wouldn't it be better to not need this at all?

The UDP tracker is old and refreshs the timeout for every packet, but
who says it needs to stay that way?

With very simple patch to only refresh in 'reply' dir:

    [NEW] udp      17 30 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
[DESTROY] udp      17 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
    [NEW] udp      17 30 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
[DESTROY] udp      17 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
    [NEW] udp      17 30 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819

This is with a unidirectional UDP stream.

diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
--- a/net/netfilter/nf_conntrack_proto_udp.c
+++ b/net/netfilter/nf_conntrack_proto_udp.c
@@ -17,6 +17,7 @@
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_ipv6.h>
+#include <net/netfilter/nf_conntrack_acct.h>
 #include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_ecache.h>
 #include <net/netfilter/nf_conntrack_timeout.h>
@@ -80,6 +81,35 @@ static bool udp_error(struct sk_buff *skb,
 	return false;
 }
 
+static void nf_ct_refresh_udp(struct nf_conn *ct,
+			      enum ip_conntrack_dir dir,
+			      u32 len, u32 extra_jiffies)
+{
+	/* Only update if this is not a fixed timeout */
+	if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status))
+		return nf_ct_acct_update(ct, dir, len);
+
+	switch (dir) {
+	case IP_CT_DIR_ORIGINAL:
+		/* do not refresh in original dir, else stale dnat mapping
+		 * can be kept alive indefinitely.
+		 */
+		if (nf_ct_is_confirmed(ct))
+			return nf_ct_acct_update(ct, dir, len);
+		break;
+	case IP_CT_DIR_REPLY:
+		extra_jiffies += nfct_time_stamp;
+		break;
+	case IP_CT_DIR_MAX: /* silence warning wrt. unhandled enum */
+		break;
+	}
+
+	if (READ_ONCE(ct->timeout) != extra_jiffies)
+		WRITE_ONCE(ct->timeout, extra_jiffies);
+
+	nf_ct_acct_update(ct, dir, len);
+}
+
 /* Returns verdict for packet, and may modify conntracktype */
 int nf_conntrack_udp_packet(struct nf_conn *ct,
 			    struct sk_buff *skb,
@@ -114,7 +144,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
 			stream = (status & IPS_ASSURED) == 0;
 		}
 
-		nf_ct_refresh_acct(ct, ctinfo, skb, extra);
+		nf_ct_refresh_udp(ct, CTINFO2DIR(ctinfo), skb->len, extra);
 
 		/* never set ASSURED for IPS_NAT_CLASH, they time out soon */
 		if (unlikely((status & IPS_NAT_CLASH)))
@@ -124,7 +154,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
 		if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
 			nf_conntrack_event_cache(IPCT_ASSURED, ct);
 	} else {
-		nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]);
+		nf_ct_refresh_udp(ct, CTINFO2DIR(ctinfo), skb->len, timeouts[UDP_CT_UNREPLIED]);
 	}
 	return NF_ACCEPT;
 }

The obvious counter-argument is "but what if its the reply side that is
sending, and we have a stale SNAT mapping?".

Solution would be to track timestamp of last packet seen per direction,
and then stop refreshing the timeout if one side ceases sending.

I think thats much better than what we have now and there would be no
need to actively zap UDP flows, they would timeout automatically once
one side ceases to transmit.




[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux