Hi, I'd like to propose a session about the SLUB allocator. Mainly I would like to discuss the addition of the sheaves caching layer, the latest RFC posted at [1]. The goals of that work is to: - Reduce fastpath overhead. The current freeing fastpath only can be used if the same target slab is still the cpu slab, which can be only expected for a very short term allocations. Further improvements should come from the new local_trylock_t primitive. - Improve efficiency of users such as like maple tree, thanks to more efficient preallocations, and kfree_rcu batching/reusal - Hopefully also facilitate further changes needed for bpf allocations, also via local_trylock_t, that could possibly extend to the other parts of the implementation as needed. The controversial discussion points I expect about this approach are: - Either sheaves will not support NUMA restrictions (as in current RFC), or bring back the alien cache flushing issues of SLAB (or there's a better idea?) - Will it be possible to eventually have sheaves enabled for every cache and replace the current slub's fastpaths with it? Arguably these are also not very efficient when NUMA-restricted allocations are requested for varying NUMA nodes (cpu slab is flushed if it's from a wrong node, to load a slab from the requested node). Besides sheaves, I'd like to summarize recent kfree_rcu() changes and we could discuss further improvements to that. Also we can discuss what's needed to support bpf allocations. I've talked about it last year, but then focused on other things, so Alexei has been driving that recently (so far in the page allocator). [1] https://lore.kernel.org/all/20250214-slub-percpu-caches-v2-0-88592ee0966a@xxxxxxx/ Thanks, Vlastimil