On Fri, Feb 24, 2012 at 11:45:49AM +0100, Jozsef Kadlecsik wrote: > On Fri, 24 Feb 2012, Jozsef Kadlecsik wrote: > > > OK, and I remove nf_conntrack_hash_insert then, and use a single ct > > argument because as you noted net and zone can be extracted from the ct. > > Here comes the second version of the patch, which assumes that the > previous one is reverted. > > Best regards, > Jozsef > > Marcell Zambo and Janos Farago noticed and reported that when > new conntrack entries are added via netlink and the conntrack table > gets full, soft lockup happens. This is because the nf_conntrack_lock > is held while nf_conntrack_alloc is called, which is in turn wants > to lock nf_conntrack_lock while evicting entries from the full table. > > The patch fixes the soft lockup with limiting the holding of the > nf_conntrack_lock to the minimum, where it's absolutely required. > It required to extend (and thus change) nf_conntrack_hash_insert > so that it makes sure conntrack and ctnetlink do not add the same entry > twice to the conntrack table. > > Signed-off-by: Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxxxxxx> > --- > include/net/netfilter/nf_conntrack.h | 2 +- > net/netfilter/nf_conntrack_core.c | 38 ++++++++++++++++++++-- > net/netfilter/nf_conntrack_netlink.c | 58 +++++++++++++-------------------- > 3 files changed, 58 insertions(+), 40 deletions(-) > > diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h > index 8a2b0ae..ab86036 100644 > --- a/include/net/netfilter/nf_conntrack.h > +++ b/include/net/netfilter/nf_conntrack.h > @@ -209,7 +209,7 @@ extern struct nf_conntrack_tuple_hash * > __nf_conntrack_find(struct net *net, u16 zone, > const struct nf_conntrack_tuple *tuple); > > -extern void nf_conntrack_hash_insert(struct nf_conn *ct); > +extern int nf_conntrack_hash_check_insert(struct nf_conn *ct); > extern void nf_ct_delete_from_lists(struct nf_conn *ct); > extern void nf_ct_insert_dying_list(struct nf_conn *ct); > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c > index 76613f5..ed86a3b 100644 > --- a/net/netfilter/nf_conntrack_core.c > +++ b/net/netfilter/nf_conntrack_core.c > @@ -404,19 +404,49 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct, > &net->ct.hash[repl_hash]); > } > > -void nf_conntrack_hash_insert(struct nf_conn *ct) > +int > +nf_conntrack_hash_check_insert(struct nf_conn *ct) > { > struct net *net = nf_ct_net(ct); > unsigned int hash, repl_hash; > + struct nf_conntrack_tuple_hash *h; > + struct hlist_nulls_node *n; > u16 zone; > > zone = nf_ct_zone(ct); > - hash = hash_conntrack(net, zone, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > - repl_hash = hash_conntrack(net, zone, &ct->tuplehash[IP_CT_DIR_REPLY].tuple); > + hash = hash_conntrack(net, zone, > + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); > + repl_hash = hash_conntrack(net, zone, > + &ct->tuplehash[IP_CT_DIR_REPLY].tuple); > + > + spin_lock_bh(&nf_conntrack_lock); > > + /* See if there's one in the list already, including reverse */ > + hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode) > + if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, > + &h->tuple) && > + zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h))) > + goto out; > + hlist_nulls_for_each_entry(h, n, &net->ct.hash[repl_hash], hnnode) > + if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_REPLY].tuple, > + &h->tuple) && > + zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h))) > + goto out; > + > + add_timer(&ct->timeout); > + nf_conntrack_get(&ct->ct_general); > __nf_conntrack_hash_insert(ct, hash, repl_hash); > + NF_CT_STAT_INC(net, insert); > + spin_unlock_bh(&nf_conntrack_lock); > + > + return 0; > + > +out: > + NF_CT_STAT_INC(net, insert_failed); > + spin_unlock_bh(&nf_conntrack_lock); > + return -EEXIST; > } > -EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert); > +EXPORT_SYMBOL_GPL(nf_conntrack_hash_check_insert); > > /* Confirm a connection given skb; places it in hash table */ > int > diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c > index 9307b03..6d73501 100644 > --- a/net/netfilter/nf_conntrack_netlink.c > +++ b/net/netfilter/nf_conntrack_netlink.c > @@ -1345,7 +1345,7 @@ ctnetlink_create_conntrack(struct net *net, u16 zone, > struct nf_conntrack_helper *helper; > struct nf_conn_tstamp *tstamp; > > - ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC); > + ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_KERNEL); > if (IS_ERR(ct)) > return ERR_PTR(-ENOMEM); > Sorry, I was wrong. I didn't notice that this is called holding rcu_read_lock() inside nfnetlink.c and we cannot use GFP_KERNEL inside read-lock area. I'm going to take the patch and will fix this myself. Thanks Jozsef. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html