Re: [LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 24 Feb 2025 19:15:16 +0100

On 2/24/25 19:02, Shakeel Butt wrote:
> On Mon, Feb 24, 2025 at 05:13:25PM +0100, Vlastimil Babka wrote:
>> Hi,
>> 
>> I'd like to propose a session about the SLUB allocator.
>> 
>> Mainly I would like to discuss the addition of the sheaves caching layer,
>> the latest RFC posted at [1].
>> 
>> The goals of that work is to:
>> 
>> - Reduce fastpath overhead. The current freeing fastpath only can be used if
>> the same target slab is still the cpu slab, which can be only expected for a
>> very short term allocations. Further improvements should come from the new
>> local_trylock_t primitive.
>> 
>> - Improve efficiency of users such as like maple tree, thanks to more
>> efficient preallocations, and kfree_rcu batching/reusal
>> 
>> - Hopefully also facilitate further changes needed for bpf allocations, also
>> via local_trylock_t, that could possibly extend to the other parts of the
>> implementation as needed.
>> 
>> The controversial discussion points I expect about this approach are:
>> 
>> - Either sheaves will not support NUMA restrictions (as in current RFC), or
>> bring back the alien cache flushing issues of SLAB (or there's a better idea?)
>> 
>> - Will it be possible to eventually have sheaves enabled for every cache and
>> replace the current slub's fastpaths with it? Arguably these are also not
>> very efficient when NUMA-restricted allocations are requested for varying
>> NUMA nodes (cpu slab is flushed if it's from a wrong node, to load a slab
>> from the requested node).
>> 
>> Besides sheaves, I'd like to summarize recent kfree_rcu() changes and we
>> could discuss further improvements to that.
>> 
>> Also we can discuss what's needed to support bpf allocations. I've talked
>> about it last year, but then focused on other things, so Alexei has been
>> driving that recently (so far in the page allocator).
> 
> What about pre-memcg-charged sheaves? We had to disable memcg charging
> of some kernel allocations

You mean due to bad performance? Which ones for example? Was the overhead
due to accounting of how much is charged, or due to the associating memcgs
with objects?

> and I think sheaves can help in reenabling
> it.

You mean by mean having separate sheaves per memcg? Wouldn't that mean
risking that too many objects could be cached in them, we'd have to flush
eventually e.g. the least recently used ones, etc? Or do you mean some other
scheme?

Thanks, Vlastimil