Re: [LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer

Shakeel Butt <shakeel.butt@xxxxxxxxx> · Mon, 24 Feb 2025 12:52:53 -0800

On Mon, Feb 24, 2025 at 07:15:16PM +0100, Vlastimil Babka wrote:
> On 2/24/25 19:02, Shakeel Butt wrote:
> > On Mon, Feb 24, 2025 at 05:13:25PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >> 
> >> I'd like to propose a session about the SLUB allocator.
> >> 
> >> Mainly I would like to discuss the addition of the sheaves caching layer,
> >> the latest RFC posted at [1].
> >> 
> >> The goals of that work is to:
> >> 
> >> - Reduce fastpath overhead. The current freeing fastpath only can be used if
> >> the same target slab is still the cpu slab, which can be only expected for a
> >> very short term allocations. Further improvements should come from the new
> >> local_trylock_t primitive.
> >> 
> >> - Improve efficiency of users such as like maple tree, thanks to more
> >> efficient preallocations, and kfree_rcu batching/reusal
> >> 
> >> - Hopefully also facilitate further changes needed for bpf allocations, also
> >> via local_trylock_t, that could possibly extend to the other parts of the
> >> implementation as needed.
> >> 
> >> The controversial discussion points I expect about this approach are:
> >> 
> >> - Either sheaves will not support NUMA restrictions (as in current RFC), or
> >> bring back the alien cache flushing issues of SLAB (or there's a better idea?)
> >> 
> >> - Will it be possible to eventually have sheaves enabled for every cache and
> >> replace the current slub's fastpaths with it? Arguably these are also not
> >> very efficient when NUMA-restricted allocations are requested for varying
> >> NUMA nodes (cpu slab is flushed if it's from a wrong node, to load a slab
> >> from the requested node).
> >> 
> >> Besides sheaves, I'd like to summarize recent kfree_rcu() changes and we
> >> could discuss further improvements to that.
> >> 
> >> Also we can discuss what's needed to support bpf allocations. I've talked
> >> about it last year, but then focused on other things, so Alexei has been
> >> driving that recently (so far in the page allocator).
> > 
> > What about pre-memcg-charged sheaves? We had to disable memcg charging
> > of some kernel allocations
> 
> You mean due to bad performance? Which ones for example? Was the overhead
> due to accounting of how much is charged, or due to the associating memcgs
> with objects?
> 

I know of the following two cases but we do hear frequently that kmemcg
accounting is not cheap.

3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"")
0bcfe68b8767 ("Revert "memcg: enable accounting for pollfd and select
bits arrays"")

> > and I think sheaves can help in reenabling
> > it.
> 
> You mean by mean having separate sheaves per memcg? Wouldn't that mean
> risking that too many objects could be cached in them, we'd have to flush
> eventually e.g. the least recently used ones, etc? Or do you mean some other
> scheme?
> 

As you pointed out a simple scheme of separate sheaves per memcg might
not work. Maybe targeting specific kmem caches or allocation sites would
be a first step. I will need to think more on this.