On 4 July 2018 at 19:14, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Mon, Jul 02, 2018 at 04:52:14PM +0200, Martynas Pumputis wrote: >> This patch enables the clash resolution for NAT (disabled in >> "590b52e10d41") if clashing conntracks match (i.e. both tuples are equal) >> and a protocol allows it. >> >> The clash might happen for a connections-less protocol (e.g. UDP) when >> two threads in parallel writes to the same socket and consequent calls >> to "get_unique_tuple" return the same tuples (incl. reply tuples). >> >> In this case it is safe to perform the resolution, as the losing CT >> describes the same mangling as the winning CT, so no modifications to >> the packet are needed, and the result of rules traversal for the loser's >> packet stays valid. >> >> Signed-off-by: Martynas Pumputis <martynas@weave.works> >> --- >> net/netfilter/nf_conntrack_core.c | 31 +++++++++++++++++++++++-------- >> 1 file changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c >> index 3465da2a98bd..cc5ef8c9d0da 100644 >> --- a/net/netfilter/nf_conntrack_core.c >> +++ b/net/netfilter/nf_conntrack_core.c >> @@ -502,6 +502,18 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, >> net_eq(net, nf_ct_net(ct)); >> } >> >> +static inline bool >> +nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2) >> +{ >> + return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple, >> + &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && >> + nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple, >> + &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) && >> + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) && >> + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) && >> + net_eq(nf_ct_net(ct1), nf_ct_net(ct2)); >> +} >> + >> /* caller must hold rcu readlock and none of the nf_conntrack_locks */ >> static void nf_ct_gc_expired(struct nf_conn *ct) >> { >> @@ -696,18 +708,21 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb, >> struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); >> const struct nf_conntrack_l4proto *l4proto; >> >> + enum ip_conntrack_info oldinfo; >> + struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); >> + >> l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); >> if (l4proto->allow_clash && >> - ((ct->status & IPS_NAT_DONE_MASK) == 0) && >> !nf_ct_is_dying(ct) && >> atomic_inc_not_zero(&ct->ct_general.use)) { >> - enum ip_conntrack_info oldinfo; >> - struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); >> - >> - nf_ct_acct_merge(ct, ctinfo, loser_ct); >> - nf_conntrack_put(&loser_ct->ct_general); >> - nf_ct_set(skb, ct, oldinfo); >> - return NF_ACCEPT; >> + if (((ct->status & IPS_NAT_DONE_MASK) == 0) || >> + nf_ct_match(ct, loser_ct)) { >> + nf_ct_acct_merge(ct, ctinfo, loser_ct); >> + nf_conntrack_put(&loser_ct->ct_general); >> + nf_ct_set(skb, ct, oldinfo); >> + return NF_ACCEPT; > > This will not work because the packet losing race will get mangled by > when NAT is performed. The idea of this patch is to resolve the conflict only among packets with the same mangling (= with matching tuples). The mangling happens before the confirmation. Please correct me if I'm missing something. > > Did you test this? Or this is some sort of RFC? I've tested it and put the steps for reproducing the issue and logs for different configurations at https://github.com/brb/conntrack-race. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html