Le vendredi 01 avril 2011 Ã 10:02 +0800, Changli Gao a Ãcrit : > On Thu, Mar 31, 2011 at 10:47 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > > > I wonder if this is not hiding another bug. > > > > Adding an RCU grace period might reduce the probability window. > > What bug? This one? > Yes. I am saying your patch is a brown paper bag. Real bug is elsewhere, and we must find it, not make it happen less frequently. Maybe its a missing barrier, and adding a full RCU grace period is a waste (of cpu caches space, since call_rcu() fill a potential big list) > > > > By the time nf_conntrack_free(ct) is called, no other cpu/thread > > could/should use ct, or ct->ext ? > > nat->ct may refer it. > conntrack must have a real problem of refcounting then. Each time an object A has a reference to object B, we should increase B refcount, so B cannot disappear. nat->ct _cannot_ refer ct if we are freeing ct. Thats quite simple rule. > > > > Sure, another thread can find/pass_on ct in a lookup but should not use > > it, since its refcount (ct_general.use) should be 0. > > > > > > As SLAB_DESTROY_BY_RCU is used, we should validate ct->orig_tuple too. > > There is another minor problem. > minor or serious ? > nf_nat_core.c: > > 133 rcu_read_lock(); > 134 hlist_for_each_entry_rcu(nat, n, > &net->ipv4.nat_bysource[h], bysourc e) { > 135 ct = nat->ct; > 136 if (same_src(ct, tuple) && nf_ct_zone(ct) == zone) { > 137 /* Copy source part from reply tuple. */ > 138 nf_ct_invert_tuplepr(result, > 139 > &ct->tuplehash[IP_CT_DIR_REPLY].tuple ); > 140 result->dst = tuple->dst; > 141 > 142 if (in_range(result, range)) { > 143 rcu_read_unlock(); > 144 return 1; > 145 } > 146 } > 147 } > 148 rcu_read_unlock(); > > If the ct is reused, NAT mapping will be wrong. It isn't a serious > problem, and can't be fixed, even though we check the reference > counter before using it, but we can't validate it with the original > tuple. > I call this a serious problem. netfilter is not a fuzzy logic. > Maybe we can do it in this way > > ct = nat->ct; > if (!nf_ct_is_dying(ct) && > atomic_inc_not_zero(&ct->ct_general.use))) { > if (nf_ct_ext_find(ct, NF_CT_EXT_NAT) == nat) { > /* ct is valid, do sth... */ > } > nf_ct_put(ct); > } > > I think two additional atomic operations are expensive. It isn't a good idea. > It depends. This is better than taking the conntrack lock. SLAB_DESTROY_BY_RCU is not allowing for fuzzy logic. Rules are we _must_ take a reference on object after finding it, and _recheck_ validity of the object before using it. Another way to avoid atomic operations in find_appropriate_src() is to use a seqcount (or seqlock), and change the seqcount every time something is changed in ct. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html