[LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 24 Feb 2025 17:13:25 +0100

Hi,

I'd like to propose a session about the SLUB allocator.

Mainly I would like to discuss the addition of the sheaves caching layer,
the latest RFC posted at [1].

The goals of that work is to:

- Reduce fastpath overhead. The current freeing fastpath only can be used if
the same target slab is still the cpu slab, which can be only expected for a
very short term allocations. Further improvements should come from the new
local_trylock_t primitive.

- Improve efficiency of users such as like maple tree, thanks to more
efficient preallocations, and kfree_rcu batching/reusal

- Hopefully also facilitate further changes needed for bpf allocations, also
via local_trylock_t, that could possibly extend to the other parts of the
implementation as needed.

The controversial discussion points I expect about this approach are:

- Either sheaves will not support NUMA restrictions (as in current RFC), or
bring back the alien cache flushing issues of SLAB (or there's a better idea?)

- Will it be possible to eventually have sheaves enabled for every cache and
replace the current slub's fastpaths with it? Arguably these are also not
very efficient when NUMA-restricted allocations are requested for varying
NUMA nodes (cpu slab is flushed if it's from a wrong node, to load a slab
from the requested node).

Besides sheaves, I'd like to summarize recent kfree_rcu() changes and we
could discuss further improvements to that.

Also we can discuss what's needed to support bpf allocations. I've talked
about it last year, but then focused on other things, so Alexei has been
driving that recently (so far in the page allocator).

[1]
https://lore.kernel.org/all/20250214-slub-percpu-caches-v2-0-88592ee0966a@xxxxxxx/

Thanks,
Vlastimil