Le lundi 24 janvier 2011 Ã 13:46 +0100, Patrick McHardy a Ãcrit : > On 24.01.2011 12:57, Pablo Neira Ayuso wrote: > > On 24/01/11 07:41, Eric Dumazet wrote: > >> Le lundi 24 janvier 2011 Ã 00:16 +0100, Pablo Neira Ayuso a Ãcrit : > >>> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks > >>> to protect the dump of the conntrack table according to reports from > >>> Stephen and acknowledgments on the issue from Eric. > >> > >> When a commit is referred, always give its title, and you can use a > >> short id (12 digits are OK) > > > > I'm using the id that git log --oneline shows, is it OK? > > > >> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and > >> destroy) > >> > >>> However, Stephen removed the refcount bump in that patch that allows > >>> to keep a reference to the current ct object we are interating over. > >>> That code avoids race conditions between ct object destruction and > >>> the iteration itself. This patch reintroduces these lines since the > >>> ct object may vanish between two recvmgs() invocations. > >>> > >>> This patch fixes ocasional crashes while dumping the conntrack table > >>> intensively. > >>> > >>> Cc: Stephen Hemminger <stephen.hemminger@xxxxxxxxxx> > >> > >> Thats not true, Stephen was not in CC on your mail > > > > I'm sorry, I forgot to add --cc to stgit. > > > >> I add him right now. > >> > >>> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > >>> --- > >>> net/netfilter/nf_conntrack_netlink.c | 8 ++++++-- > >>> 1 files changed, 6 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c > >>> index 93297aa..519e245 100644 > >>> --- a/net/netfilter/nf_conntrack_netlink.c > >>> +++ b/net/netfilter/nf_conntrack_netlink.c > >>> @@ -654,14 +654,16 @@ restart: > >>> if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > >>> continue; > >>> ct = nf_ct_tuplehash_to_ctrack(h); > >>> + if (!atomic_inc_not_zero(&ct->ct_general.use)) > >>> + continue; > >>> /* Dump entries of a given L3 protocol number. > >>> * If it is not specified, ie. l3proto == 0, > >>> * then dump everything. */ > >>> if (l3proto && nf_ct_l3num(ct) != l3proto) > >>> - continue; > >>> + goto releasect; > >>> if (cb->args[1]) { > >>> if (ct != last) > >>> - continue; > >>> + goto releasect; > >>> cb->args[1] = 0; > >>> } > >>> if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, > >>> @@ -679,6 +681,8 @@ restart: > >>> if (acct) > >>> memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX])); > >>> } > >>> +releasect: > >>> + nf_ct_put(ct); > >>> } > >>> if (cb->args[1]) { > >>> cb->args[1] = 0; > >>> > >> > >> > >> Hmm... > >> > >> Only the RCU lookup should need this extra check (atomic_inc_not_zero), > >> not a writer. (And a dumper is a "writer" since it holds nf_conntrack > >> spinlock) > >> > >> In Stephen commit, we switched from RCU lookup to traditional one > >> (spinlock protected), so, we should not need to touch individual objects > >> refcount ? > >> > >> I feel the right fix is not to increment refcount then decrement it. > >> > >> Only to _test_ it being zero, and not dumping the ct, eventually ? > >> > >> > >> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c > >> index 93297aa..a977cc7 100644 > >> --- a/net/netfilter/nf_conntrack_netlink.c > >> +++ b/net/netfilter/nf_conntrack_netlink.c > >> @@ -654,6 +654,8 @@ restart: > >> if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > >> continue; > >> ct = nf_ct_tuplehash_to_ctrack(h); > >> + if (!atomic_read(&ct->ct_general.use)) > >> + continue; > >> /* Dump entries of a given L3 protocol number. > >> * If it is not specified, ie. l3proto == 0, > >> * then dump everything. */ > > > > No, that won't work. > > > > Sorry, I didn't explain the problem appropriately. We still do a > > nf_ct_put() for last ct. In that patch, you forgot to add the refcount > > bump that we need to keep the reference to the object. > > > > The following patch attached is smaller, it also fixes the problem and > > we don't have to increase the refcount for each object. Now it looks > > similar to how it was before the RCU patches. > > This looks correct to me, we need to keep a reference for continuations. > Eric, do you agree with this patch? Taking a reference I agree for sure, this is the full atomic_inc_not_zero(&ct->ct_general.use) I disagree. Pablo, the refcount cannot be 0 at this point, or something is really wrong ? Why not using atomic_inc(&ct->ct_general.use); ? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html