Re: cgroups: warning for metadata allocation with GFP_NOFAIL (was Re: folio_alloc_buffers() doing allocations > order 1 with GFP_NOFAIL)

Shakeel Butt <shakeelb@xxxxxxxxxx> · Wed, 8 Nov 2023 22:37:00 -0800

On Wed, Nov 8, 2023 at 2:33 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 07-11-23 10:05:24, Roman Gushchin wrote:
> > On Mon, Nov 06, 2023 at 06:57:05PM -0800, Christoph Lameter wrote:
> > > Right.. Well lets add the cgoup folks to this.
> >
> > Hello!
> >
> > I think it's the best thing we can do now. Thoughts?
> >
> > >From 5ed3e88f4f052b6ce8dbec0545dfc80eb7534a1a Mon Sep 17 00:00:00 2001
> > From: Roman Gushchin <roman.gushchin@xxxxxxxxx>
> > Date: Tue, 7 Nov 2023 09:18:02 -0800
> > Subject: [PATCH] mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors
> >
> > Objcg vectors attached to slab pages to store slab object ownership
> > information are allocated using gfp flags for the original slab
> > allocation. Depending on slab page order and the size of slab objects,
> > objcg vector can take several pages.
> >
> > If the original allocation was done with the __GFP_NOFAIL flag, it
> > triggered a warning in the page allocation code. Indeed, order > 1
> > pages should not been allocated with the __GFP_NOFAIL flag.
> >
> > Fix this by simple dropping the __GFP_NOFAIL flag when allocating
> > the objcg vector. It effectively allows to skip the accounting of a
> > single slab object under a heavy memory pressure.
>
> It would be really good to describe what happens if the memcg metadata
> allocation fails. AFAICS both callers of memcg_alloc_slab_cgroups -
> memcg_slab_post_alloc_hook and account_slab will simply skip the
> accounting which is rather curious but probably tolerable (does this
> allow to runaway from memcg limits). If that is intended then it should
> be documented so that new users do not get it wrong. We do not want to
> error ever propagate down to the allocator caller which doesn't expect
> it.

The memcg metadata allocation failure is a situation kind of similar
to how we used to have per-memcg kmem caches for accounting slab
memory. The first allocation from a memcg triggers kmem cache creation
and lets the allocation pass through.

>
> Btw. if the large allocation is really necessary, which hasn't been
> explained so far AFAIK, would vmalloc fallback be an option?
>

For this specific scenario, large allocation is kind of unexpected,
like a large (multi-order) slab having tiny objects. Roman, do you
know the slab settings where this failure occurs?

Anyways, I think kvmalloc is a better option. Most of the time we
should have order 0 allocation here and for weird settings we fallback
to vmalloc.