On Wed, 29 Nov 2023 at 10:53, Vlastimil Babka <vbabka@xxxxxxx> wrote: > > kmem_cache_setup_percpu_array() will allocate a per-cpu array for > caching alloc/free objects of given size for the cache. The cache > has to be created with SLAB_NO_MERGE flag. > > When empty, half of the array is filled by an internal bulk alloc > operation. When full, half of the array is flushed by an internal bulk > free operation. > > The array does not distinguish NUMA locality of the cached objects. If > an allocation is requested with kmem_cache_alloc_node() with numa node > not equal to NUMA_NO_NODE, the array is bypassed. > > The bulk operations exposed to slab users also try to utilize the array > when possible, but leave the array empty or full and use the bulk > alloc/free only to finish the operation itself. If kmemcg is enabled and > active, bulk freeing skips the array completely as it would be less > efficient to use it. > > The locking scheme is copied from the page allocator's pcplists, based > on embedded spin locks. Interrupts are not disabled, only preemption > (cpu migration on RT). Trylock is attempted to avoid deadlock due to an > interrupt; trylock failure means the array is bypassed. > > Sysfs stat counters alloc_cpu_cache and free_cpu_cache count objects > allocated or freed using the percpu array; counters cpu_cache_refill and > cpu_cache_flush count objects refilled or flushed form the array. > > kmem_cache_prefill_percpu_array() can be called to ensure the array on > the current cpu to at least the given number of objects. However this is > only opportunistic as there's no cpu pinning between the prefill and > usage, and trylocks may fail when the usage is in an irq handler. > Therefore allocations cannot rely on the array for success even after > the prefill. But misses should be rare enough that e.g. GFP_ATOMIC > allocations should be acceptable after the refill. > > When slub_debug is enabled for a cache with percpu array, the objects in > the array are considered as allocated from the slub_debug perspective, > and the alloc/free debugging hooks occur when moving the objects between > the array and slab pages. This means that e.g. an use-after-free that > occurs for an object cached in the array is undetected. Collected > alloc/free stacktraces might also be less useful. This limitation could > be changed in the future. > > On the other hand, KASAN, kmemcg and other hooks are executed on actual > allocations and frees by kmem_cache users even if those use the array, > so their debugging or accounting accuracy should be unaffected. > > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > --- > include/linux/slab.h | 4 + > include/linux/slub_def.h | 12 ++ > mm/Kconfig | 1 + > mm/slub.c | 457 ++++++++++++++++++++++++++++++++++++++++++++++- > 4 files changed, 468 insertions(+), 6 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index d6d6ffeeb9a2..fe0c0981be59 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -197,6 +197,8 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name, > void kmem_cache_destroy(struct kmem_cache *s); > int kmem_cache_shrink(struct kmem_cache *s); > > +int kmem_cache_setup_percpu_array(struct kmem_cache *s, unsigned int count); > + > /* > * Please use this macro to create slab caches. Simply specify the > * name of the structure and maybe some flags that are listed above. > @@ -512,6 +514,8 @@ void kmem_cache_free(struct kmem_cache *s, void *objp); > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p); > int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p); > > +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int count, gfp_t gfp); > + > static __always_inline void kfree_bulk(size_t size, void **p) > { > kmem_cache_free_bulk(NULL, size, p); > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index deb90cf4bffb..2083aa849766 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -13,8 +13,10 @@ > #include <linux/local_lock.h> > > enum stat_item { > + ALLOC_PCA, /* Allocation from percpu array cache */ > ALLOC_FASTPATH, /* Allocation from cpu slab */ > ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */ > + FREE_PCA, /* Free to percpu array cache */ > FREE_FASTPATH, /* Free to cpu slab */ > FREE_SLOWPATH, /* Freeing not to cpu slab */ > FREE_FROZEN, /* Freeing to frozen slab */ > @@ -39,6 +41,8 @@ enum stat_item { > CPU_PARTIAL_FREE, /* Refill cpu partial on free */ > CPU_PARTIAL_NODE, /* Refill cpu partial from node partial */ > CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */ > + PCA_REFILL, /* Refilling empty percpu array cache */ > + PCA_FLUSH, /* Flushing full percpu array cache */ > NR_SLUB_STAT_ITEMS > }; > > @@ -66,6 +70,13 @@ struct kmem_cache_cpu { > }; > #endif /* CONFIG_SLUB_TINY */ > > +struct slub_percpu_array { > + spinlock_t lock; > + unsigned int count; > + unsigned int used; > + void * objects[]; checkpatch complains: "foo * bar" should be "foo *bar"