Re: [PATCH] netfilter: nf_conntrack: resolve clash for matching conntracks

Martynas Pumputis <martynas@weave.works> · Mon, 2 Jul 2018 15:40:00 +0200

> Martynas Pumputis <martynas@weave.works> wrote:
>> This patch enables the clash resolution for NAT (disabled in
>> "590b52e10d41") if clashing conntracks match (i.e. both tuples are equal)
>> and a protocol allows it.
>>
>> In this case it is safe to perform the resolution, as the losing CT
>> describes the same mangling as the winning CT, so no modifications to
>> the packet are needed, and the result of rules traversal stays valid.
>
> It would be nice to explain under which conditions this is happening.

Sure, I will add to the commit message. This is the problem for
connection-less protocols, in my case UDP, when multiple threads write
to the same socket in parallel, and thus they might race to confirm
the same conntrack entry (depending on whether get_unique_tuple
returned the same tuples) - a typical glibc parallel DNS resolver
scenario. I can easily reproduce the case on my machine with a simple
C program - a UDP listener and multiple client threads communicating
over loopback. I can provide the source code if needed.

>
> Is this for udp tunnels?

It can involve UDP tunnels as well.

> Does this involve nfqueue?

I haven't checked, but I believe the conntrack confirmation race in
nfqueue has been fixed in 368982cd7d1bd41cd39049c794990aca3770db44.

>
> I fail to see how this could ever be true, it would require
> two identical packets to be received on different interfaces
> at exactly the same time.
>
> Else NAT would cause port reallocation/rewrite of the 'losing'
> packet.
>
>> +static inline bool
>> +nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2)
>> +{
>> +     return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
>> +                              &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) &&
>> +            nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple,
>> +                              &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) &&
>> +            nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) &&
>> +            nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) &&
>> +            net_eq(nf_ct_net(ct1), nf_ct_net(ct2));
>> +}
>> +
>>  /* caller must hold rcu readlock and none of the nf_conntrack_locks */
>>  static void nf_ct_gc_expired(struct nf_conn *ct)
>>  {
>> @@ -696,13 +708,15 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
>>       struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>>       const struct nf_conntrack_l4proto *l4proto;
>>
>> +     enum ip_conntrack_info oldinfo;
>> +     struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo);
>> +
>>       l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
>>       if (l4proto->allow_clash &&
>> -         ((ct->status & IPS_NAT_DONE_MASK) == 0) &&
>> +         (((ct->status & IPS_NAT_DONE_MASK) == 0) ||
>> +             nf_ct_match(ct, loser_ct)) &&
>
> This compare can occur on conntrack that might already be released
> (zero refcount), so result isn't reliable.
>
>>           !nf_ct_is_dying(ct) &&
>>           atomic_inc_not_zero(&ct->ct_general.use)) {
>
> once atomic_inc_not_zero() is done, such conntrack can't
> go away and tuple won't change.
>
> Either nf_ct_match() must be called after atomic_inc_not_zero(),
> or it has to be called again to make sure the conntrack didn't change
> in between.
>
> I suspect you need to call nf_ct_match again in
>   atomic_inc_not_zero(&ct->ct_general.use)) {
>           /* here */
>
> and nf_ct_put(ct) in case the entry wasn't identical after all.

Thanks for the pointers, makes sense.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html