cgroups: warning for metadata allocation with GFP_NOFAIL (was Re: folio_alloc_buffers() doing allocations > order 1 with GFP_NOFAIL)

Christoph Lameter <cl@xxxxxxxxx> · Mon, 6 Nov 2023 18:57:05 -0800 (PST)

Right.. Well lets add the cgoup folks to this.

The code that simply uses the GFP_NOFAIL to allocate cgroup metadata 
using an order > 1:

int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
				 gfp_t gfp, bool new_slab)
{
	unsigned int objects = objs_per_slab(s, slab);
	unsigned long memcg_data;
	void *vec;

	gfp &= ~OBJCGS_CLEAR_MASK;
	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
			   slab_nid(slab));

On Wed, 1 Nov 2023, Matthew Wilcox wrote:

On Tue, Oct 31, 2023 at 05:13:57PM -0700, Christoph Lameter (Ampere) wrote:
Hi Matthew,

There is a strange warning on bootup related to folios. Seen it a couple of
times before. Why does this occur?

Filesystems generally can't cope with failing to allocate a bufferhead.
So the buffer head code sets __GFP_NOFAIL.  That's better than trying
to implement __GFP_NOFAIL semantics in the fs code, right?

[   20.878110] Call trace:
[   20.878111]  get_page_from_freelist+0x214/0x17f8
[   20.878116]  __alloc_pages+0x17c/0xe08
[   20.878120]  __kmalloc_large_node+0xa0/0x170
[   20.878123]  __kmalloc_node+0x120/0x1d0
[   20.878125]  memcg_alloc_slab_cgroups+0x48/0xc0

Oho.  It's not buffer's fault, specifically.  memcg is allocating
its own metadata for the slab.  I decree this Not My Fault.

[   20.878128]  memcg_slab_post_alloc_hook+0xa8/0x1c8
[   20.878132]  kmem_cache_alloc+0x18c/0x338
[   20.878135]  alloc_buffer_head+0x28/0xa0
[   20.878138]  folio_alloc_buffers+0xe8/0x1c0
[   20.878141]  folio_create_empty_buffers+0x2c/0x1e8
[   20.878143]  folio_create_buffers+0x58/0x80
[   20.878145]  block_read_full_folio+0x80/0x450
[   20.878148]  blkdev_read_folio+0x24/0x38
[   20.956921]  filemap_read_folio+0x60/0x138
[   20.956925]  do_read_cache_folio+0x180/0x298
[   20.965270]  read_cache_page+0x24/0x90
[   20.965273]  __arm64_sys_swapon+0x2e0/0x1208
[   20.965277]  invoke_syscall+0x78/0x108
[   20.965282]  el0_svc_common.constprop.0+0x48/0xf0
[   20.981702]  do_el0_svc+0x24/0x38
[   20.993773]  el0t_64_sync_handler+0x100/0x130
[   20.993776]  el0t_64_sync+0x190/0x198
[   20.993779] ---[ end trace 0000000000000000 ]---
[   20.999972] Adding 999420k swap on /dev/mapper/eng07sys--r113--vg-swap_1.
Priority:-2 extents:1 across:999420k SS

This is due to

folio_alloc_buffers() setting GFP_NOFAIL:

struct buffer_head *folio_alloc_buffers(struct folio *folio, unsigned long
size,
                                        bool retry)
{
        struct buffer_head *bh, *head;
        gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
        long offset;
        struct mem_cgroup *memcg, *old_memcg;

        if (retry)
                gfp |= __GFP_NOFAIL;

This isn't new.  It was introduced by 640ab98fb362 in 2017.
It seems reasonable to be able to kmalloc(512, GFP_NOFAIL).  It's the
memcg code which is having problems here.