Hello Matthew. There's good news. in v3 (work in progress now), I fixed some bugs (I hate kernel panics!) And for test, made NAPI use it. it works pretty well. On Tue, Sep 21, 2021 at 05:17:02PM +0100, Matthew Wilcox wrote: > On Tue, Sep 21, 2021 at 03:42:39PM +0000, Hyeonggon Yoo wrote: > > > > + /* slowpath */ > > > > + cache->size = kmem_cache_alloc_bulk(s, gfpflags, > > > > + KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue); > > > > > > Go back to the Bonwick paper and look at the magazine section again. > > > You have to allocate _half_ the size of the queue, otherwise you get > > > into pathological situations where you start to free and allocate > > > every time. > > > > I want to ask you where idea of allocating 'half' the size of queue came from. > > the paper you sent does not work with single queue(magazine). Instead, > > it manages pool of magazines. > > > > And after reading the paper, I see managing pool of magazine (where M is > > an boot parameter) is valid approach to reduce hitting slowpath. > > Bonwick uses two magazines per cpu; if both are empty, one is replaced > with a full one. If both are full, one is replaced with an empty one. > Our slab implementation doesn't provide magazine allocation, but it does > provide bulk allocation. > So translating the Bonwick implementation to > our implementation, we need to bulk-allocate or bulk-free half of the > array at any time. Is there a reason that the number should be 'half'? what about something like this: diff --git a/mm/slab_common.c b/mm/slab_common.c index 884d3311cd8e..f32736302d53 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -455,12 +455,13 @@ void *kmem_cache_alloc_cached(struct kmem_cache *s, gfp_t gfpflags) } cache = get_cpu_ptr(s->cache); - if (cache->size) /* fastpath without lock */ + if (cache->size) /* fastpath without lock */ p = cache->queue[--cache->size]; else { /* slowpath */ - cache->size = kmem_cache_alloc_bulk(s, gfpflags, - KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue); + cache->size += kmem_cache_alloc_bulk(s, gfpflags, + KMEM_LOCKLESS_CACHE_BATCHCOUNT, + cache->queue); if (cache->size) p = cache->queue[--cache->size]; else @@ -491,13 +492,13 @@ void kmem_cache_free_cached(struct kmem_cache *s, void *p) cache = get_cpu_ptr(s->cache); if (cache->size < KMEM_LOCKLESS_CACHE_QUEUE_SIZE) { cache->queue[cache->size++] = p; - put_cpu_ptr(s->cache); - return ; + } else { + kmem_cache_free_bulk(s, + KMEM_LOCKLESS_CACHE_BATCHCOUNT, + cache->queue - KMEM_LOCKLESS_CACHE_BATCHCOUNT); + cache->size -= KMEM_LOCKLESS_CACHE_BATCHCOUNT; } put_cpu_ptr(s->cache); - - /* Is there better way to do this? */ - kmem_cache_free(s, p); } EXPORT_SYMBOL(kmem_cache_free_cached);