On 11/19/24 03:29, Hyeonggon Yoo wrote: > On Mon, Nov 18, 2024 at 11:26 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> On 11/18/24 14:13, Hyeonggon Yoo wrote: >> > On Wed, Nov 13, 2024 at 1:39 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> + >> >> +/* >> >> + * Allocate from a sheaf obtained by kmem_cache_prefill_sheaf() >> >> + * >> >> + * Guaranteed not to fail as many allocations as was the requested count. >> >> + * After the sheaf is emptied, it fails - no fallback to the slab cache itself. >> >> + * >> >> + * The gfp parameter is meant only to specify __GFP_ZERO or __GFP_ACCOUNT >> >> + * memcg charging is forced over limit if necessary, to avoid failure. >> >> + */ >> >> +void * >> >> +kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp, >> >> + struct slab_sheaf *sheaf) >> >> +{ >> >> + void *ret = NULL; >> >> + bool init; >> >> + >> >> + if (sheaf->size == 0) >> >> + goto out; >> >> + >> >> + ret = sheaf->objects[--sheaf->size]; >> >> + >> >> + init = slab_want_init_on_alloc(gfp, s); >> >> + >> >> + /* add __GFP_NOFAIL to force successful memcg charging */ >> >> + slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size); >> > >> > Maybe I'm missing something, but how can this be used for non-sleepable contexts >> > if __GFP_NOFAIL is used? I think we have to charge them when the sheaf >> >> AFAIK it forces memcg to simply charge even if allocated memory goes over >> the memcg limit. So there's no issue with a non-sleepable context, there >> shouldn't be memcg reclaim happening in that case. > > Ok, but I am still worried about mem alloc profiling/memcg trying to > allocate some memory > with __GFP_NOFAIL flag and eventually passing it to the buddy allocator, > which does not want __GFP_NOFAIL without __GFP_DIRECT_RECLAIM? > e.g.) memcg hook calls > alloc_slab_obj_exts()->kcalloc_node()->....->alloc_pages() alloc_slab_obj_exts() removes __GFP_NOFAIL via OBJCGS_CLEAR_MASK so that's fine. I think kmemleak_alloc_recursive() is also fine as it ends up in mem_pool_alloc() and will clear __GFP_NOFAIL it via gfp_nested_mask() Hope I'm not missing something else. >> > is returned >> > via kmem_cache_prefill_sheaf(), just like users of bulk alloc/free? >> >> That would be very costly to charge/uncharge if most of the objects are not >> actually used - it's what we want to avoid here. >> Going over the memcgs limit a bit in a very rare case isn't considered such >> an issue, for example Linus advocated such approach too in another context. > > Thanks for the explanation! That was a point I was missing. > >> > Best, >> > Hyeonggon >> > >> >> +out: >> >> + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE); >> >> + >> >> + return ret; >> >> +} >> >> + >> >> /* >> >> * To avoid unnecessary overhead, we pass through large allocation requests >> >> * directly to the page allocator. We use __GFP_COMP, because we will need to >> >> >> >> -- >> >> 2.47.0 >> >> >>