On Wed, Jun 12, 2024 at 04:39:15PM +0200, Linus Lüssing wrote: > On Wed, Jun 12, 2024 at 07:06:04AM -0700, Paul E. McKenney wrote: > > Let me make sure that I understand... > > > > You need rcu_barrier() to wait for any memory passed to kfree_rcu() > > to actually be freed? If so, please explain why you need this, as > > in what bad thing happens if the actual kfree() happens later. > > > > (I could imagine something involving OOM avoidance, but I need to > > hear your code's needs rather than my imaginations.) > > > > Thanx, Paul > > We have allocated a kmem-cache for some objects, which are like > batman-adv's version of a bridge's FDB entry. > > The very last thing we do before unloading the module is > free'ing/destroying this kmem-cache with a call to > kmem_cache_destroy(). > > As far as I understand before calling kmem_cache_destroy() > we need to ensure that all previously allocated objects on this > kmem-cache were free'd. At least we get this kernel splat > (from Slub?) otherwise. I'm not quite sure if any other bad things > other than this noise in dmesg would occur though. Other than a > stale, zero objects entry remaining in /proc/slabinfo maybe. Which > gets duplicated everytime we repeat loading+unloading the module. > At least these entries would be a memory leak I suppose? > > ``` > # after insmod/rmmod'ing batman-adv 6 times: > $ cat /proc/slabinfo | grep batadv_tl_cache > batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 > batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 > batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 > batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 > batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 > batadv_tl_cache 0 16 256 16 1 : tunables 0 0 0 : slabdata 1 1 0 > ``` > > That's why we added this rcu_barrier() call on module > shutdown in the batman-adv module __exit function right before the > kmem_cache_destroy() calls. Hoping that this would wait for all > call_rcu() / kfree_rcu() callbacks and their final kfree() to finish. > This worked when we were using call_rcu() with our own callback > with a kfree(). However for kfree_rcu() this somehow does not seem > to be the case anymore (- or more likely I'm missing something else, > some other bug within the batman-adv code?). It is quite possible that some of the recent energy-saving changes have caused rcu_barrier() to not wait for all kfree_rcu() memory to be freed. Which is timely, given a bunch of recently proposed changes that seemed like a good idea to me at the time. ;-) Thanx, Paul