On Tue, Jul 05, 2022 at 12:24:49PM +0100, Will Deacon wrote: > > > Sorry, but I have absolutely no context here. We have a handy document > > > describing the differences between atomic_t and refcount_t: > > > > > > Documentation/core-api/refcount-vs-atomic.rst > > > > > > What else do you need to know? > > > > Hmm, and I see a tonne of *_inc_not_zero() conversions in 719774377622 > > ("netfilter: conntrack: convert to refcount_t api") which mean that you > > no longer have ordering to subsequent reads in the absence of an address > > dependency. > > I think the patch above needs auditing with the relaxed behaviour in mind, > but for the specific crash reported here possibly something like the diff > below? > > Will > > --->8 > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c > index 082a2fd8d85b..5ad9fcc84269 100644 > --- a/net/netfilter/nf_conntrack_core.c > +++ b/net/netfilter/nf_conntrack_core.c > @@ -1394,6 +1394,7 @@ static unsigned int early_drop_list(struct net *net, > * already fired or someone else deleted it. Just drop ref > * and move to next entry. > */ > + smp_rmb(); /* XXX: Why? */ > if (net_eq(nf_ct_net(tmp), net) && > nf_ct_is_confirmed(tmp) && > nf_ct_delete(tmp, 0, 0)) > Just to follow up, I think you're right, the patch in question should be audited further for other missing memory barrier issues. While this one smp_rmb() helps a lot, ie lets the test run for at least an hour or two, an overnight 6 hour test still resulted in the same crash somewhere along the way so it looks like it's not the only one that's needed.