This series allows conntrack to insert a duplicate conntrack entry if the reply direction doesn't result in a clash with a different original connection. Background: kubernetes creates load-balancing rules for DNS using -m statistics, e.g.: -p udp --dport 53 -m statistics --mode random ... -j DNAT --to-destination x -p udp --dport 53 -m statistics --mode random ... -j DNAT --to-destination y When the resolver sends an A and AAAA request back-to-back from different threads on the same socket, this has a high chance of a connection tracking clash at insertion time. This in turn results in a drop of the clashing udp packet which then results in a 5 second DNS timeout. The clash cannot be resolved with the current logic because the two conntracks entries have different NAT transformations, the first one from s:highport to x.53, the second from s:highport to y.53. One solution is to change rules to use a consistent mapping, e.g. using -m cluster or nftables 'jhash' expression. This would cause the A and AAAA requests coming from same socket to match the same rule and thus share the same NAT information. However, I do not believe this is a realistic course of action. This change adds a second clash resolution/drop avoidance step: A clashing entry will be added anyway provided the reply direction is unique. Because this results in duplicate conntrack entries for the original direction, this comes with strings attached: 1. The clashed entry will only be around for 1 second 2. The clashed entry can only be found in reply direction (not inserted for ORIGINAL) 3. The clashed entry is auto-removed once first reply comes in 4 The clashed entry is never assured and can thus be evicted if conntrack table becomes full. Major change since RFC: 1. Do not insert the duplicate/clash in original dir. 2. This implicitly hides the entry from "conntrack -L". 3. use an internal status bit to auto-remove the conntrack when first reply comes in. 4. Extend the commit message of last patch to include a summary of alternate proposals (and why they did not work out). I'm sending this for nf rather than nf-next because I consider this a bug fix, but I am fine if this is deferred for nf-next instead. Florian Westphal (4): netfilter: conntrack: remove two args from resolve_clash netfilter: conntrack: place confirm-bit setting in a helper netfilter: conntrack: split resolve_clash function netfilter: conntrack: allow insertion of clashing entries include/linux/rculist_nulls.h | 7 +++++ include/uapi/linux/netfilter/nf_conntrack_common.h | 12 ++++++++- net/netfilter/nf_conntrack_core.c | 192 ++++++++++++++++------ net/netfilter/nf_conntrack_proto_udp.c | 20 ++++++++++-- 4 files changed, 198 insertions(+), 33 deletions(-)