On Wed, Nov 08, 2023 at 10:37:00PM -0800, Shakeel Butt wrote: > On Wed, Nov 8, 2023 at 2:33 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Tue 07-11-23 10:05:24, Roman Gushchin wrote: > > > On Mon, Nov 06, 2023 at 06:57:05PM -0800, Christoph Lameter wrote: > > > > Right.. Well lets add the cgoup folks to this. > > > > > > Hello! > > > > > > I think it's the best thing we can do now. Thoughts? > > > > > > >From 5ed3e88f4f052b6ce8dbec0545dfc80eb7534a1a Mon Sep 17 00:00:00 2001 > > > From: Roman Gushchin <roman.gushchin@xxxxxxxxx> > > > Date: Tue, 7 Nov 2023 09:18:02 -0800 > > > Subject: [PATCH] mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors > > > > > > Objcg vectors attached to slab pages to store slab object ownership > > > information are allocated using gfp flags for the original slab > > > allocation. Depending on slab page order and the size of slab objects, > > > objcg vector can take several pages. > > > > > > If the original allocation was done with the __GFP_NOFAIL flag, it > > > triggered a warning in the page allocation code. Indeed, order > 1 > > > pages should not been allocated with the __GFP_NOFAIL flag. > > > > > > Fix this by simple dropping the __GFP_NOFAIL flag when allocating > > > the objcg vector. It effectively allows to skip the accounting of a > > > single slab object under a heavy memory pressure. > > > > It would be really good to describe what happens if the memcg metadata > > allocation fails. AFAICS both callers of memcg_alloc_slab_cgroups - > > memcg_slab_post_alloc_hook and account_slab will simply skip the > > accounting which is rather curious but probably tolerable (does this > > allow to runaway from memcg limits). If that is intended then it should > > be documented so that new users do not get it wrong. We do not want to > > error ever propagate down to the allocator caller which doesn't expect > > it. > > The memcg metadata allocation failure is a situation kind of similar > to how we used to have per-memcg kmem caches for accounting slab > memory. The first allocation from a memcg triggers kmem cache creation > and lets the allocation pass through. > > > > > Btw. if the large allocation is really necessary, which hasn't been > > explained so far AFAIK, would vmalloc fallback be an option? > > > > For this specific scenario, large allocation is kind of unexpected, > like a large (multi-order) slab having tiny objects. Roman, do you > know the slab settings where this failure occurs? No, I hope Christoph will shed some light here. > Anyways, I think kvmalloc is a better option. Most of the time we > should have order 0 allocation here and for weird settings we fallback > to vmalloc. I'm not sure about kvmalloc, because it's not fast. I think the better option would be to force the slab allocator to fall back to order-0 pages. Theoretically, we don't even need to free and re-allocate slab objects, but break the slab folio into pages and release all but first page. But I'd like to learn more about the use case before committing any time into this effort. Thanks!