Le lundi 24 janvier 2011 Ã 07:41 +0100, Eric Dumazet a Ãcrit : > Le lundi 24 janvier 2011 Ã 00:16 +0100, Pablo Neira Ayuso a Ãcrit : > > In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks > > to protect the dump of the conntrack table according to reports from > > Stephen and acknowledgments on the issue from Eric. > > > > When a commit is referred, always give its title, and you can use a > short id (12 digits are OK) > > In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and > destroy) > > > However, Stephen removed the refcount bump in that patch that allows > > to keep a reference to the current ct object we are interating over. > > That code avoids race conditions between ct object destruction and > > the iteration itself. This patch reintroduces these lines since the > > ct object may vanish between two recvmgs() invocations. > > > > This patch fixes ocasional crashes while dumping the conntrack table > > intensively. > > > > Cc: Stephen Hemminger <stephen.hemminger@xxxxxxxxxx> > > Thats not true, Stephen was not in CC on your mail > > I add him right now. > > > Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > --- > > net/netfilter/nf_conntrack_netlink.c | 8 ++++++-- > > 1 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c > > index 93297aa..519e245 100644 > > --- a/net/netfilter/nf_conntrack_netlink.c > > +++ b/net/netfilter/nf_conntrack_netlink.c > > @@ -654,14 +654,16 @@ restart: > > if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > > continue; > > ct = nf_ct_tuplehash_to_ctrack(h); > > + if (!atomic_inc_not_zero(&ct->ct_general.use)) > > + continue; > > /* Dump entries of a given L3 protocol number. > > * If it is not specified, ie. l3proto == 0, > > * then dump everything. */ > > if (l3proto && nf_ct_l3num(ct) != l3proto) > > - continue; > > + goto releasect; > > if (cb->args[1]) { > > if (ct != last) > > - continue; > > + goto releasect; > > cb->args[1] = 0; > > } > > if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, > > @@ -679,6 +681,8 @@ restart: > > if (acct) > > memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX])); > > } > > +releasect: > > + nf_ct_put(ct); > > } > > if (cb->args[1]) { > > cb->args[1] = 0; > > > > > Hmm... > > Only the RCU lookup should need this extra check (atomic_inc_not_zero), > not a writer. (And a dumper is a "writer" since it holds nf_conntrack > spinlock) > > In Stephen commit, we switched from RCU lookup to traditional one > (spinlock protected), so, we should not need to touch individual objects > refcount ? > > I feel the right fix is not to increment refcount then decrement it. > > Only to _test_ it being zero, and not dumping the ct, eventually ? > > > diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c > index 93297aa..a977cc7 100644 > --- a/net/netfilter/nf_conntrack_netlink.c > +++ b/net/netfilter/nf_conntrack_netlink.c > @@ -654,6 +654,8 @@ restart: > if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > continue; > ct = nf_ct_tuplehash_to_ctrack(h); > + if (!atomic_read(&ct->ct_general.use)) > + continue; > /* Dump entries of a given L3 protocol number. > * If it is not specified, ie. l3proto == 0, > * then dump everything. */ > I read ea781f197d6a (netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()) and can see previous code only had the spinlock, not the refcount inc/dec. Are you saying that my commit included a bug fix ? ;) -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html