On Mon, Sep 20, 2021 at 03:48:16PM +0000, Hyeonggon Yoo wrote: > +#define KMEM_LOCKLESS_CACHE_QUEUE_SIZE 64 I would suggest that, to be nice to the percpu allocator, this be one less than 2^n. > +struct kmem_lockless_cache { > + void *queue[KMEM_LOCKLESS_CACHE_QUEUE_SIZE]; > + unsigned int size; > +}; I would also suggest that 'size' be first as it is going to be accessed every time, and then there's a reasonable chance that queue[size - 1] will be in the same cacheline. CPUs will tend to handle that better. > +/** > + * kmem_cache_alloc_cached - try to allocate from cache without lock > + * @s: slab cache > + * @flags: SLAB flags > + * > + * Try to allocate from cache without lock. If fails, fill the lockless cache > + * using bulk alloc API > + * > + * Be sure that there's no race condition. > + * Must create slab cache with SLAB_LOCKLESS_CACHE flag to use this function. > + * > + * Return: a pointer to free object on allocation success, NULL on failure. > + */ > +void *kmem_cache_alloc_cached(struct kmem_cache *s, gfp_t gfpflags) > +{ > + struct kmem_lockless_cache *cache = this_cpu_ptr(s->cache); > + > + BUG_ON(!(s->flags & SLAB_LOCKLESS_CACHE)); > + > + if (cache->size) /* fastpath without lock */ > + return cache->queue[--cache->size]; > + > + /* slowpath */ > + cache->size = kmem_cache_alloc_bulk(s, gfpflags, > + KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue); Go back to the Bonwick paper and look at the magazine section again. You have to allocate _half_ the size of the queue, otherwise you get into pathological situations where you start to free and allocate every time. > +void kmem_cache_free_cached(struct kmem_cache *s, void *p) > +{ > + struct kmem_lockless_cache *cache = this_cpu_ptr(s->cache); > + > + BUG_ON(!(s->flags & SLAB_LOCKLESS_CACHE)); > + > + /* Is there better way to do this? */ > + if (cache->size == KMEM_LOCKLESS_CACHE_QUEUE_SIZE) > + kmem_cache_free(s, cache->queue[--cache->size]); Yes. if (cache->size == KMEM_LOCKLESS_CACHE_QUEUE_SIZE) { kmem_cache_free_bulk(s, KMEM_LOCKLESS_CACHE_QUEUE_SIZE / 2, &cache->queue[KMEM_LOCKLESS_CACHE_QUEUE_SIZE / 2)); cache->size = KMEM_LOCKLESS_CACHE_QUEUE_SIZE / 2; (check the maths on that; it might have some off-by-one)