On 9 July 2018 at 20:12, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Sat, Jul 07, 2018 at 12:44:10PM +0200, Martynas Pumputis wrote: >> On 4 July 2018 at 19:14, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: >> > On Mon, Jul 02, 2018 at 04:52:14PM +0200, Martynas Pumputis wrote: >> >> This patch enables the clash resolution for NAT (disabled in >> >> "590b52e10d41") if clashing conntracks match (i.e. both tuples are equal) >> >> and a protocol allows it. >> >> >> >> The clash might happen for a connections-less protocol (e.g. UDP) when >> >> two threads in parallel writes to the same socket and consequent calls >> >> to "get_unique_tuple" return the same tuples (incl. reply tuples). >> >> >> >> In this case it is safe to perform the resolution, as the losing CT >> >> describes the same mangling as the winning CT, so no modifications to >> >> the packet are needed, and the result of rules traversal for the loser's >> >> packet stays valid. >> >> >> >> Signed-off-by: Martynas Pumputis <martynas@weave.works> >> >> --- >> >> net/netfilter/nf_conntrack_core.c | 31 +++++++++++++++++++++++-------- >> >> 1 file changed, 23 insertions(+), 8 deletions(-) >> >> >> >> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c >> >> index 3465da2a98bd..cc5ef8c9d0da 100644 >> >> --- a/net/netfilter/nf_conntrack_core.c >> >> +++ b/net/netfilter/nf_conntrack_core.c >> >> @@ -502,6 +502,18 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, >> >> net_eq(net, nf_ct_net(ct)); >> >> } >> >> >> >> +static inline bool >> >> +nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2) >> >> +{ >> >> + return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple, >> >> + &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && >> >> + nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple, >> >> + &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) && >> >> + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) && >> >> + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) && >> >> + net_eq(nf_ct_net(ct1), nf_ct_net(ct2)); >> >> +} >> >> + >> >> /* caller must hold rcu readlock and none of the nf_conntrack_locks */ >> >> static void nf_ct_gc_expired(struct nf_conn *ct) >> >> { >> >> @@ -696,18 +708,21 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb, >> >> struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); >> >> const struct nf_conntrack_l4proto *l4proto; >> >> >> >> + enum ip_conntrack_info oldinfo; >> >> + struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); >> >> + >> >> l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); >> >> if (l4proto->allow_clash && >> >> - ((ct->status & IPS_NAT_DONE_MASK) == 0) && >> >> !nf_ct_is_dying(ct) && >> >> atomic_inc_not_zero(&ct->ct_general.use)) { >> >> - enum ip_conntrack_info oldinfo; >> >> - struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); >> >> - >> >> - nf_ct_acct_merge(ct, ctinfo, loser_ct); >> >> - nf_conntrack_put(&loser_ct->ct_general); >> >> - nf_ct_set(skb, ct, oldinfo); >> >> - return NF_ACCEPT; >> >> + if (((ct->status & IPS_NAT_DONE_MASK) == 0) || >> >> + nf_ct_match(ct, loser_ct)) { >> >> + nf_ct_acct_merge(ct, ctinfo, loser_ct); >> >> + nf_conntrack_put(&loser_ct->ct_general); >> >> + nf_ct_set(skb, ct, oldinfo); >> >> + return NF_ACCEPT; >> > >> > This will not work because the packet losing race will get mangled by >> > when NAT is performed. >> >> The idea of this patch is to resolve the conflict only among packets >> with the same mangling (= with matching tuples). The mangling happens >> before the confirmation. Please correct me if I'm missing something. > > Yes, the mangling happens before confirmation, but will mangle the > source port to make sure two makes don't get the same tuple. Hence, > the two packets will get different source port numbers, that's why we > need the code to undo the NAT mangling in > 368982cd7d1bd41cd39049c794990aca3770db44. If two packets get different source port numbers (by "get_unique_tuple"), then their CT tuples won't match (= "nf_ct_match" will return false) => the conflict won't be resolved. > > Have a look at tcpdump to make sure packets are mangled consistently, > not just going through / not dropped. Yes, I've checked traces from tcpdump to see whether manglings are OK. The logs I provided include the traces. > >> > Did you test this? Or this is some sort of RFC? >> >> I've tested it and put the steps for reproducing the issue and logs >> for different configurations at https://github.com/brb/conntrack-race. > > OK, are you watching for packets drops to make sure your fix work, I > mean, how are you doing the validation? I'm validating the fix by logging packets for which the conflict cannot be resolved and checking with tcpdump whether they have been dropped. > > Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html