Re: [PATCH 2/2] vmalloc: Account memcg per vmalloc

Shakeel Butt <shakeel.butt@xxxxxxxxx> · Wed, 11 Dec 2024 11:32:13 -0800

On Wed, Dec 11, 2024 at 04:50:39PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 11, 2024 at 11:09:56AM -0500, Johannes Weiner wrote:
> > This would work, but it seems somewhat complicated. The atomics in
> > memcg charging and the vmstat updates are batched, and the per-page
> > overhead is for the most part cheap per-cpu ops. Not an issue per se.
> 
> OK, fair enough, I hadn't realised it was a percpu-refcount.  Still,
> we might consume several batches (batch size of 64) when we could do it
> all in one shot.
> 
> Perhaps you'd be more persuaded by:
> 
> (a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and
> that's a pretty significant performance win over calling alloc_pages()
> in a loop.
> 
> (b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT
> set is going to require allocating a memdesc to store the obj_cgroup
> in, so in the future we'll save an allocation.
> 
> Your proposed alternative will work and is way less churn.  But it's
> not preparing us for memdescs ;-)

We can make alloc_pages_bulk() work with __GFP_ACCOUNT but your second
argument is more compelling.

I am trying to think of what will we miss if we remove this per-page
memcg metadata. One thing I can think of is debugging a live system
or kdump where I need to track where a given page came from. I think
memory profiling will still be useful in combination with going through
all vmalloc regions where this page is mapped (is there an easy way to
tell if a page is from a vmalloc region?). So, for now I think we will
have alternative way to extract the useful information.

I think we can go with Johannes' solution for stable and discuss the
future direction more separately.