On Fri, Feb 14, 2025 at 05:27:37PM +0100, Vlastimil Babka wrote: > Specifying a non-zero value for a new struct kmem_cache_args field > sheaf_capacity will setup a caching layer of percpu arrays called > sheaves of given capacity for the created cache. > > Allocations from the cache will allocate via the percpu sheaves (main or > spare) as long as they have no NUMA node preference. Frees will also > refill one of the sheaves. > > When both percpu sheaves are found empty during an allocation, an empty > sheaf may be replaced with a full one from the per-node barn. If none > are available and the allocation is allowed to block, an empty sheaf is > refilled from slab(s) by an internal bulk alloc operation. When both > percpu sheaves are full during freeing, the barn can replace a full one > with an empty one, unless over a full sheaves limit. In that case a > sheaf is flushed to slab(s) by an internal bulk free operation. Flushing > sheaves and barns is also wired to the existing cpu flushing and cache > shrinking operations. > > The sheaves do not distinguish NUMA locality of the cached objects. If > an allocation is requested with kmem_cache_alloc_node() with a specific > node (not NUMA_NO_NODE), sheaves are bypassed. > > The bulk operations exposed to slab users also try to utilize the > sheaves as long as the necessary (full or empty) sheaves are available > on the cpu or in the barn. Once depleted, they will fallback to bulk > alloc/free to slabs directly to avoid double copying. > > Sysfs stat counters alloc_cpu_sheaf and free_cpu_sheaf count objects > allocated or freed using the sheaves. Counters sheaf_refill, > sheaf_flush_main and sheaf_flush_other count objects filled or flushed > from or to slab pages, and can be used to assess how effective the > caching is. The refill and flush operations will also count towards the > usual alloc_fastpath/slowpath, free_fastpath/slowpath and other > counters. > > Access to the percpu sheaves is protected by local_lock_irqsave() > operations, each per-NUMA-node barn has a spin_lock. > > A current limitation is that when slub_debug is enabled for a cache with > percpu sheaves, the objects in the array are considered as allocated from > the slub_debug perspective, and the alloc/free debugging hooks occur > when moving the objects between the array and slab pages. This means > that e.g. an use-after-free that occurs for an object cached in the > array is undetected. Collected alloc/free stacktraces might also be less > useful. This limitation could be changed in the future. > > On the other hand, KASAN, kmemcg and other hooks are executed on actual > allocations and frees by kmem_cache users even if those use the array, > so their debugging or accounting accuracy should be unaffected. > > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > --- > include/linux/slab.h | 34 ++ > mm/slab.h | 2 + > mm/slab_common.c | 5 +- > mm/slub.c | 982 ++++++++++++++++++++++++++++++++++++++++++++++++--- > 4 files changed, 973 insertions(+), 50 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index e8273f28656936c05d015c53923f8fe69cd161b2..c06734912972b799f537359f7fe6a750918ffe9e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > > /******************************************************************** > * Core slab cache functions > +static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu) > +{ > + struct slub_percpu_sheaves *pcs; > + > + pcs = per_cpu_ptr(s->cpu_sheaves, cpu); > + > + if (pcs->spare) { > + sheaf_flush(s, pcs->spare); > + free_empty_sheaf(s, pcs->spare); > + pcs->spare = NULL; > + } > + > + // TODO: handle rcu_free > + BUG_ON(pcs->rcu_free); > + > + sheaf_flush_main(s); > +} +1 on what Suren mentioned. > +static void barn_shrink(struct kmem_cache *s, struct node_barn *barn) > +{ > + struct list_head empty_list; > + struct list_head full_list; > + struct slab_sheaf *sheaf, *sheaf2; > + unsigned long flags; > + > + INIT_LIST_HEAD(&empty_list); > + INIT_LIST_HEAD(&full_list); > + > + spin_lock_irqsave(&barn->lock, flags); > + > + list_splice_init(&barn->sheaves_full, &full_list); > + barn->nr_full = 0; > + list_splice_init(&barn->sheaves_empty, &empty_list); > + barn->nr_empty = 0; > + > + spin_unlock_irqrestore(&barn->lock, flags); > + > + list_for_each_entry_safe(sheaf, sheaf2, &full_list, barn_list) { > + sheaf_flush(s, sheaf); > + list_move(&sheaf->barn_list, &empty_list); > + } nit: is this list_move() necessary? > + > + list_for_each_entry_safe(sheaf, sheaf2, &empty_list, barn_list) > + free_empty_sheaf(s, sheaf); > +} Otherwise looks good to me. -- Cheers, Harry