On Thu, Jun 25, 2009 at 11:29:13AM +0200, Jesper Dangaard Brouer wrote: > > On Wed, 2009-06-24 at 15:58 +0200, Patrick McHardy wrote: > > Jesper Dangaard Brouer wrote: > > > Adjusting SLAB_DESTROY_BY_RCU flags. > > > > > > kmem_cache_create("nf_conntrack", ...) does not need the > > > SLAB_DESTROY_BY_RCU flag. > > > > It does need it. We're using it instead of call_rcu() for conntracks. > > > > > But the > > > kmem_cache_create("nf_conntrack_expect", ...) should use the > > > SLAB_DESTROY_BY_RCU flag, because it uses a call_rcu() callback to > > > invoke kmem_cache_free(). > > > > No, using call_rcu() means we don't need SLAB_DESTROY_BY_RCU. > > Please see the note in include/linux/slab.h. > > Oh, I see. The description is some what cryptic, but I think I got it, > after reading through the code. > > BUT this still means that we need to do rcu_barrier() if the > SLAB_DESTROY_BY_RCU is NOT set and we do call_rcu() our self. > > Look at: slab.c kmem_cache_destroy() > > void kmem_cache_destroy(struct kmem_cache *cachep) > { > ...<cut>... > if (__cache_shrink(cachep)) { > slab_error(cachep, "Can't free all objects"); > ...<cut>... > return; > } > > if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU)) > synchronize_rcu(); > > __kmem_cache_destroy(cachep); > ...<cut>... > } > > My understanding for the code is (please feel free to correct me): that > if SLAB_DESTROY_BY_RCU _is_ set, then the __cache_shrink() call will > call drain_freelist(), which calls slab_destroy(). > > If SLAB_DESTROY_BY_RCU _is_ set, then slab_destroy() will then start a > call_rcu() callback to kmem_rcu_free() which calls kmem_cache_free(). > Given that the callback code kmem_rcu_free() is not removed, we are not > worried about unloading the module at this point. > > I'm a bit worried about what happens if __kmem_cache_destroy() is > invoked and there is still callbacks for kmem_rcu_free() in flight? > The synchronize_rcu() between __cache_shrink() and > __kmem_cache_destroy() should perhaps be changed to rcu_barrier()? It looks to me like it should, good catch!!! I sent a proposed patch to the maintainers. Thanx, Paul > But I'm sure that the SLAB/MM guys will tell me that this case is > handled (and something about its unlinked from the appropiate > lists)??? ;-) > > > > > RCU barriers, rcu_barrier(), is inserted two places. > > > > > > In nf_conntrack_expect.c nf_conntrack_expect_fini() before the > > > kmem_cache_destroy(), even though the use of the SLAB_DESTROY_BY_RCU > > > flag, because slub does not (currently) handle rcu sync correctly. > > > > I think that should be fixed in slub then. > > I don't think so, we/I'm are talking about "nf_conntrack_expect" and not > "nf_conntrack" slab. Clearly the slab "nf_conntrack" is handled > correcly (according to description above). > > We still need to make sure the callbacks for "nf_conntrack_expect", are > done before unloading/removing the code they are about to call. > > > > > And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to > > > wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is > > > invoked by __nf_ct_ext_add(). It might be more efficient to call > > > rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but > > > thats make it more difficult to read the code (as the callback code > > > in located in nf_conntrack_extend.c). > > > > This one looks fine. > > Should I make two different patchs? > > > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c > > > index 5f72b94..438ce84 100644 > > > --- a/net/netfilter/nf_conntrack_core.c > > > +++ b/net/netfilter/nf_conntrack_core.c > > > @@ -1242,7 +1242,7 @@ static int nf_conntrack_init_init_net(void) > > > > > > nf_conntrack_cachep = kmem_cache_create("nf_conntrack", > > > sizeof(struct nf_conn), > > > - 0, SLAB_DESTROY_BY_RCU, NULL); > > > + 0, 0, NULL); > > > if (!nf_conntrack_cachep) { > > > printk(KERN_ERR "Unable to create nf_conn slab cache\n"); > > > ret = -ENOMEM; > > > diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c > > > index afde8f9..56227c2 100644 > > > --- a/net/netfilter/nf_conntrack_expect.c > > > +++ b/net/netfilter/nf_conntrack_expect.c > > > @@ -593,7 +593,7 @@ int nf_conntrack_expect_init(struct net *net) > > > if (net_eq(net, &init_net)) { > > > nf_ct_expect_cachep = kmem_cache_create("nf_conntrack_expect", > > > sizeof(struct nf_conntrack_expect), > > > - 0, 0, NULL); > > > + 0, SLAB_DESTROY_BY_RCU, NULL); > > > if (!nf_ct_expect_cachep) > > > goto err2; > > > } > > > @@ -617,8 +617,15 @@ err1: > > > void nf_conntrack_expect_fini(struct net *net) > > > { > > > exp_proc_remove(net); > > > - if (net_eq(net, &init_net)) > > > + if (net_eq(net, &init_net)) { > > > + /* hawk@xxxxxxx 2009-06-24: The rcu_barrier() can be > > > + * removed once the sl*b allocators has been fixed > > > + * regarding handling the SLAB_DESTROY_BY_RCU flag > > > + * correctly. > > > + */ > > > + rcu_barrier(); /* Wait for call_rcu() before destroy */ > > > kmem_cache_destroy(nf_ct_expect_cachep); > > > + } > > > nf_ct_free_hashtable(net->ct.expect_hash, net->ct.expect_vmalloc, > > > nf_ct_expect_hsize); > > > } > > > diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c > > > index 4b2c769..fef95be 100644 > > > --- a/net/netfilter/nf_conntrack_extend.c > > > +++ b/net/netfilter/nf_conntrack_extend.c > > > @@ -186,6 +186,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type) > > > rcu_assign_pointer(nf_ct_ext_types[type->id], NULL); > > > update_alloc_size(type); > > > mutex_unlock(&nf_ct_ext_type_mutex); > > > - synchronize_rcu(); > > > + rcu_barrier(); /* Wait for completion of call_rcu()'s */ > > > } > > > EXPORT_SYMBOL_GPL(nf_ct_extend_unregister); > > > > > > -- > Med venlig hilsen / Best regards > Jesper Brouer > ComX Networks A/S > Linux Network developer > Cand. Scient Datalog / MSc. > Author of http://adsl-optimizer.dk > LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html