On Thu, May 23, 2024 at 02:31:05PM +0100, Matthew Wilcox wrote: > On Tue, May 21, 2024 at 12:29:39PM -0700, Shakeel Butt wrote: > > On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: > > > The memcg should not be attached to the individual pages that make up a > > > vmalloc allocation. Rather, it should be managed by the vmalloc > > > allocation itself. I don't have the knowledge to poke around inside > > > vmalloc right now, but maybe somebody else could take that on. > > > > Are you concerned about accessing just memcg or any field of the > > sub-page? There are drivers accessing fields of pages allocated through > > vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing > > pages should be split rather than compound"). > > Thanks for the pointer, and fb_deferred_io_fault() is already on my > hitlist for abusing struct page. > > My primary concern is that we should track the entire allocation as a > single object rather than tracking each page individually. That means > assigning the vmalloc allocation to a memcg rather than assigning each > page to a memcg. It's a lot less overhead to increment the counter once > per allocation rather than once per page in the allocation! > > But secondarily, yes, pages allocated by vmalloc probably don't need > any per-page state, other than tracking the vmalloc allocation they're > assigned to. We'll see how that theory turns out. I think the tricky part would be vmalloc having pages spanning multiple nodes which is not an issue for MEMCG_VMALLOC stat but the vmap based kernel stack (CONFIG_VMAP_STACK) metric NR_KERNEL_STACK_KB cares about that information.