On Wed, Nov 29, 2023 at 2:37 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 8/21/23 16:57, Hyeonggon Yoo wrote: > > Hi, > > > > On Fri, Aug 11, 2023 at 1:36 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > Oops, looks like I forgot reply, sorry (preparing v3 now). It's fine, you were busy removing SLAB :) thanks for replying. > > > >> /* > >> * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc) > >> * have the fastpath folded into their functions. So no function call > >> @@ -3465,7 +3564,11 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list > >> if (unlikely(object)) > >> goto out; > >> > >> - object = __slab_alloc_node(s, gfpflags, node, addr, orig_size); > >> + if (s->cpu_array) > >> + object = alloc_from_pca(s); > >> + > >> + if (!object) > >> + object = __slab_alloc_node(s, gfpflags, node, addr, orig_size); > >> > >> maybe_wipe_obj_freeptr(s, object); > >> init = slab_want_init_on_alloc(gfpflags, s); > >> @@ -3715,6 +3818,34 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > >> discard_slab(s, slab); > >> } > > > >> #ifndef CONFIG_SLUB_TINY > >> /* > >> * Fastpath with forced inlining to produce a kfree and kmem_cache_free that > >> @@ -3740,6 +3871,11 @@ static __always_inline void do_slab_free(struct kmem_cache *s, > >> unsigned long tid; > >> void **freelist; > >> > >> + if (s->cpu_array && cnt == 1) { > >> + if (free_to_pca(s, head)) > >> + return; > >> + } > >> + > >> redo: > >> /* > >> * Determine the currently cpus per cpu slab. > >> @@ -3793,6 +3929,11 @@ static void do_slab_free(struct kmem_cache *s, > >> { > >> void *tail_obj = tail ? : head; > >> > >> + if (s->cpu_array && cnt == 1) { > >> + if (free_to_pca(s, head)) > >> + return; > >> + } > >> + > >> __slab_free(s, slab, head, tail_obj, cnt, addr); > >> } > >> #endif /* CONFIG_SLUB_TINY */ > > > > Is this functionality needed for SLUB_TINY? > > Due to the prefill semantics, I think it has to be be even in TINY, or we > risk running out of memory reserves. Also later I want to investigate > extending this approach for supporting allocations in very constrained > contexts (NMI) so e.g. bpf doesn't have to reimplement the slab allocator, > and that would also not be good to limit to !SLUB_TINY. I've got the point, thanks for the explanation! > >> @@ -4060,6 +4201,45 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, > >> } > >> EXPORT_SYMBOL(kmem_cache_alloc_bulk); > >> > >> +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int count, > >> + gfp_t gfp) > >> +{ > >> + struct slub_percpu_array *pca; > >> + void *objects[32]; > >> + unsigned int used; > >> + unsigned int allocated; > >> + > >> + if (!s->cpu_array) > >> + return -EINVAL; > >> + > >> + /* racy but we don't care */ > >> + pca = raw_cpu_ptr(s->cpu_array); > >> + > >> + used = READ_ONCE(pca->used); > > > > Hmm for the prefill to be meaningful, > > remote allocation should be possible, right? > > Remote in what sense? TL;DR) What I wanted to ask was: "How pre-filling a number of objects works when the pre-filled objects are not shared between CPUs" IIUC the prefill is opportunistically filling the array so (hopefully) expecting there are some objects filled in it. Let's say CPU X calls kmem_cache_prefill_percpu_array(32) and all 32 objects are filled into CPU X's array. But if CPU Y can't allocate from CPU X's array (which I referred to as "remote allocation"), the semantics differ from the maple tree's perspective because preallocated objects were shared between CPUs before, but now it's not? Thanks! -- Hyeonggon