Patrick McHardy a écrit : > Eric Dumazet wrote: >> I have a litle problem on __nf_conntrack_find() being exported. >> >> Problem is that with SLAB_DESTROY_BY_RCU we must take a reference on >> object >> to recheck it. So ideally only nf_conntrack_find_get() should be used, >> or callers of __nf_conntrack_find() should lock nf_conntrack_lock >> (as properly done for example in net/netfilter/nf_conntrack_netlink.c, >> line 1292) >> >> Here is preliminary patch for review (not tested at all, its 4h50 am >> here :) ) >> >> Could you help me, by checking __nf_conntrack_find() use in >> net/netfilter/xt_connlimit.c ? >> and line 1246 of net/netfilter/nf_conntrack_netlink.c >> >> This part is a litle bit gray for me. :) > > In case of xt_connlimit, it seems fine to just take a reference. > In case of ctnetlink, keeping the unreferenced lookup under the > lock seems fine. We unfortunately have to export some internals > like nf_conntrack lock for ctnetlink anyways, so I don't think > it would be worth to change it to take references and unexport > the lookup function. > >> +/* >> + * Warning : >> + * - Caller must take a reference on returned object >> + * and recheck nf_ct_tuple_equal(tuple, &h->tuple) >> + * OR >> + * - Caller must lock nf_conntrack_lock before calling this function >> + */ >> struct nf_conntrack_tuple_hash * >> __nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple >> *tuple) >> { >> struct nf_conntrack_tuple_hash *h; >> - struct hlist_node *n; >> + struct hlist_nulls_node *n; >> unsigned int hash = hash_conntrack(tuple); >> >> /* Disable BHs the entire time since we normally need to disable >> them >> * at least once for the stats anyway. >> */ >> local_bh_disable(); >> - hlist_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnode) { >> +begin: >> + hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) { >> if (nf_ct_tuple_equal(tuple, &h->tuple)) { >> NF_CT_STAT_INC(net, found); >> local_bh_enable(); >> @@ -261,6 +270,13 @@ __nf_conntrack_find(struct net *net, const struct >> nf_conntrack_tuple *tuple) >> } >> NF_CT_STAT_INC(net, searched); >> } >> + /* >> + * if the nulls value we got at the end of this lookup is >> + * not the expected one, we must restart lookup. >> + * We probably met an item that was moved to another chain. >> + */ >> + if (get_nulls_value(n) != hash) >> + goto begin; > > Are you sure this is enough? An entry might have been reused and added > to the same chain I think, so I think we need to recheck the tuple. Yes, done in caller > >> local_bh_enable(); >> >> return NULL; >> @@ -275,11 +291,18 @@ nf_conntrack_find_get(struct net *net, const >> struct nf_conntrack_tuple *tuple) >> struct nf_conn *ct; >> >> rcu_read_lock(); >> +begin: >> h = __nf_conntrack_find(net, tuple); >> if (h) { >> ct = nf_ct_tuplehash_to_ctrack(h); >> if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use))) >> h = NULL; >> + else { >> + if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple))) { >> + nf_ct_put(ct); >> + goto begin; > > Ah I see, the hash comparison above is only an optimization? > >> + } >> + } >> } >> rcu_read_unlock(); >> > > check net/ipv4/udp.c for an example (__udp4_lib_lookup()) In case of UDP, key check is not returning true/false, but a score. So UDP case is a litle bit more complex than conntrack case. rcu_read_lock(); begin: result = NULL; badness = -1; sk_nulls_for_each_rcu(sk, node, &hslot->head) { score = compute_score(sk, net, saddr, hnum, sport, daddr, dport, dif); if (score > badness) { result = sk; badness = score; } } /* * if the nulls value we got at the end of this lookup is * not the expected one, we must restart lookup. * We probably met an item that was moved to another chain. */ if (get_nulls_value(node) != hash) goto begin; if (result) { if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt))) result = NULL; else if (unlikely(compute_score(result, net, saddr, hnum, sport, daddr, dport, dif) < badness)) { sock_put(result); goto begin; } } rcu_read_unlock(); return result; -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html