Also in git: https://git.kernel.org/vbabka/l/slab-kfree_rcu-destroy-v2r2 Since SLOB was removed, we have allowed kfree_rcu() for objects allocated from any kmem_cache in addition to kmalloc(). Recently we have attempted to replace existing call_rcu() usage with kfree_rcu() where the callback is a plain kmem_cache_free(), in a series by Julia Lawall [1]. Jakub Kicinski pointed out [2] this was tried already in batman-adv but had to be reverted due to kmem_cache_destroy() failing due to objects remaining in the cache, despite rcu_barrier() being used. Jason Donenfeld found the culprit [3] being a35d16905efc ("rcu: Add basic support for kfree_rcu() batching") causing rcu_barrier() to be insufficient. This was never a problem for kfree_rcu() usage on kmalloc() objects as the kmalloc caches are never destroyed, but arbitrary caches can be, e.g. due to module unload. Out of the possible solutions collected by Paul McKenney [4] the most appealing to me is "kmem_cache_destroy() lingers for kfree_rcu()" as it adds no additional concerns to kfree_rcu() users. We already have the precedence in some parts of the kmem_cache cleanup being done asynchronously for SLAB_TYPESAFE_BY_RCU caches. The v1 of this RFC took the same approach for asynchronously waiting for pending kfree_rcu(). Mateusz Guzik on IRC questioned this approach, and it turns out the rcu_barrier() used to be synchronous before commit 657dc2f97220 ("slab: remove synchronous rcu_barrier() call in memcg cache release path") and the motivation for that is no longer applicable. So instead in v2 the existing barrier is reverted to be synchronous, and the new barrier for kfree_rcu() is also called sychronously. The new kvfree_rcu_barrier() was provided by Uladzislau Rezki in a patch [5] carried now by this series. There is also a bunch of preliminary cleanup steps. The potentially visible one is that sysfs and debugfs directories, as well as /proc/slabinfo record of the cache are now removed immediately during kmem_cache_destroy() - previously this would be delayed for SLAB_TYPESAFE_BY_RCU caches or left around forever if leaked objects were detected. Even though we no longer have the delayed removal, leaked objects should not prevent the cache to be recreated including its sysfs and debugfs directories, so it's better to make this cleanup anyway. The immediate removal is the simplest solution (compared to e.g. renaming the directories) and should not make debugging harder - while it won't be possible to check debugfs for allocation traces of leaked objects, they are listed with more detail in dmesg anyway. [1] https://lore.kernel.org/all/20240609082726.32742-1-Julia.Lawall@xxxxxxxx/ [2] https://lore.kernel.org/all/20240612143305.451abf58@xxxxxxxxxx/ [3] https://lore.kernel.org/all/Zmo9-YGraiCj5-MI@xxxxxxxxx/ [4] https://docs.google.com/document/d/1v0rcZLvvjVGejT3523W0rDy_sLFu2LWc_NR3fQItZaA/edit [5] https://lore.kernel.org/all/20240801111039.79656-1-urezki@xxxxxxxxx/ To: Paul E. McKenney <paulmck@xxxxxxxxxx> To: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> To: Josh Triplett <josh@xxxxxxxxxxxxxxxx> To: Boqun Feng <boqun.feng@xxxxxxxxx> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> CC: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> Cc: Zqiang <qiang.zhang1211@xxxxxxxxx> Cc: Julia Lawall <Julia.Lawall@xxxxxxxx> Cc: Jakub Kicinski <kuba@xxxxxxxxxx> Cc: Jason A. Donenfeld <Jason@xxxxxxxxx> Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> To: Christoph Lameter <cl@xxxxxxxxx> To: David Rientjes <rientjes@xxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx> Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> Cc: linux-mm@xxxxxxxxx Cc: linux-kernel@xxxxxxxxxxxxxxx Cc: rcu@xxxxxxxxxxxxxxx Cc: Alexander Potapenko <glider@xxxxxxxxxx> Cc: Marco Elver <elver@xxxxxxxxxx> Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx> Cc: kasan-dev@xxxxxxxxxxxxxxxx Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Mateusz Guzik <mjguzik@xxxxxxxxx> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> --- Changes in v2: - Include the necessary barrier implementation (by Uladzislau Rezki) - Switch to synchronous barriers (Mateusz Guzik) - Moving of kfence_shutdown_cache() outside slab_mutex done in a separate step for review and bisectability. - Additional kunit test for destroying a cache with leaked object. - Link to v1: https://lore.kernel.org/r/20240715-b4-slab-kfree_rcu-destroy-v1-0-46b2984c2205@xxxxxxx --- Uladzislau Rezki (Sony) (1): rcu/kvfree: Add kvfree_rcu_barrier() API Vlastimil Babka (6): mm, slab: dissolve shutdown_cache() into its caller mm, slab: unlink slabinfo, sysfs and debugfs immediately mm, slab: move kfence_shutdown_cache() outside slab_mutex mm, slab: reintroduce rcu_barrier() into kmem_cache_destroy() mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy() kunit, slub: add test_kfree_rcu() and test_leak_destroy() include/linux/rcutiny.h | 5 +++ include/linux/rcutree.h | 1 + kernel/rcu/tree.c | 103 ++++++++++++++++++++++++++++++++++++++++---- lib/slub_kunit.c | 31 ++++++++++++++ mm/slab_common.c | 111 ++++++++++++++---------------------------------- 5 files changed, 163 insertions(+), 88 deletions(-) --- base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b change-id: 20240715-b4-slab-kfree_rcu-destroy-85dd2b2ded92 Best regards, -- Vlastimil Babka <vbabka@xxxxxxx>