Re: cgroups: warning for metadata allocation with GFP_NOFAIL (was Re: folio_alloc_buffers() doing allocations > order 1 with GFP_NOFAIL)

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 7 Nov 2023 21:37:17 +0000

On Tue, Nov 07, 2023 at 01:33:41PM -0800, Roman Gushchin wrote:
> On Tue, Nov 07, 2023 at 07:24:08PM +0000, Matthew Wilcox wrote:
> > On Mon, Nov 06, 2023 at 06:57:05PM -0800, Christoph Lameter wrote:
> > > Right.. Well lets add the cgoup folks to this.
> > > 
> > > The code that simply uses the GFP_NOFAIL to allocate cgroup metadata using
> > > an order > 1:
> > > 
> > > int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
> > > 				 gfp_t gfp, bool new_slab)
> > > {
> > > 	unsigned int objects = objs_per_slab(s, slab);
> > > 	unsigned long memcg_data;
> > > 	void *vec;
> > > 
> > > 	gfp &= ~OBJCGS_CLEAR_MASK;
> > > 	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
> > > 			   slab_nid(slab));
> > 
> > But, but but, why does this incur an allocation larger than PAGE_SIZE?
> > 
> > sizeof(void *) is 8.  We have N objects allocated from the slab.  I
> > happen to know this is used for buffer_head, so:
> > 
> > buffer_head         1369   1560    104   39    1 : tunables    0    0    0 : slabdata     40     40      0
> > 
> > we get 39 objects per slab.  and we're only allocating one page per slab.
> > 39 * 8 is only 312.
> > 
> > Maybe Christoph is playing with min_slab_order or something, so we're
> > getting 8 pages per slab.  That's still only 2496 bytes.  Why are we
> > calling into the large kmalloc path?  What's really going on here?
> 
> Good question and I *guess* it's something related to Christoph's hardware
> (64k pages or something like this) - otherwise we would see it sooner.

I was wondering about that, and obviously it'd make N scale up.  But then,
we'd be able to fit more pointers in a page too.  At the ed of the day,
8 < 104.  Even if we go to order-3, 64 < 104.  If Christoph is playing
with min_slab_order=4, we'd see it ... but that's a really big change,
and I don't think it would justify this patch, let alone cc'ing stable.