On Tue, Nov 07, 2023 at 01:33:41PM -0800, Roman Gushchin wrote: > On Tue, Nov 07, 2023 at 07:24:08PM +0000, Matthew Wilcox wrote: > > On Mon, Nov 06, 2023 at 06:57:05PM -0800, Christoph Lameter wrote: > > > Right.. Well lets add the cgoup folks to this. > > > > > > The code that simply uses the GFP_NOFAIL to allocate cgroup metadata using > > > an order > 1: > > > > > > int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s, > > > gfp_t gfp, bool new_slab) > > > { > > > unsigned int objects = objs_per_slab(s, slab); > > > unsigned long memcg_data; > > > void *vec; > > > > > > gfp &= ~OBJCGS_CLEAR_MASK; > > > vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, > > > slab_nid(slab)); > > > > But, but but, why does this incur an allocation larger than PAGE_SIZE? > > > > sizeof(void *) is 8. We have N objects allocated from the slab. I > > happen to know this is used for buffer_head, so: > > > > buffer_head 1369 1560 104 39 1 : tunables 0 0 0 : slabdata 40 40 0 > > > > we get 39 objects per slab. and we're only allocating one page per slab. > > 39 * 8 is only 312. > > > > Maybe Christoph is playing with min_slab_order or something, so we're > > getting 8 pages per slab. That's still only 2496 bytes. Why are we > > calling into the large kmalloc path? What's really going on here? > > Good question and I *guess* it's something related to Christoph's hardware > (64k pages or something like this) - otherwise we would see it sooner. I was wondering about that, and obviously it'd make N scale up. But then, we'd be able to fit more pointers in a page too. At the ed of the day, 8 < 104. Even if we go to order-3, 64 < 104. If Christoph is playing with min_slab_order=4, we'd see it ... but that's a really big change, and I don't think it would justify this patch, let alone cc'ing stable.