Re: cgroups: warning for metadata allocation with GFP_NOFAIL (was Re: folio_alloc_buffers() doing allocations > order 1 with GFP_NOFAIL)

Roman Gushchin <roman.gushchin@xxxxxxxxx> · Tue, 7 Nov 2023 13:33:41 -0800

On Tue, Nov 07, 2023 at 07:24:08PM +0000, Matthew Wilcox wrote:
> On Mon, Nov 06, 2023 at 06:57:05PM -0800, Christoph Lameter wrote:
> > Right.. Well lets add the cgoup folks to this.
> > 
> > The code that simply uses the GFP_NOFAIL to allocate cgroup metadata using
> > an order > 1:
> > 
> > int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
> > 				 gfp_t gfp, bool new_slab)
> > {
> > 	unsigned int objects = objs_per_slab(s, slab);
> > 	unsigned long memcg_data;
> > 	void *vec;
> > 
> > 	gfp &= ~OBJCGS_CLEAR_MASK;
> > 	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
> > 			   slab_nid(slab));
> 
> But, but but, why does this incur an allocation larger than PAGE_SIZE?
> 
> sizeof(void *) is 8.  We have N objects allocated from the slab.  I
> happen to know this is used for buffer_head, so:
> 
> buffer_head         1369   1560    104   39    1 : tunables    0    0    0 : slabdata     40     40      0
> 
> we get 39 objects per slab.  and we're only allocating one page per slab.
> 39 * 8 is only 312.
> 
> Maybe Christoph is playing with min_slab_order or something, so we're
> getting 8 pages per slab.  That's still only 2496 bytes.  Why are we
> calling into the large kmalloc path?  What's really going on here?

Good question and I *guess* it's something related to Christoph's hardware
(64k pages or something like this) - otherwise we would see it sooner.

I'd like to have the answer too.

Thanks!