On Wed, Dec 11, 2024 at 08:20:36PM +0000, Matthew Wilcox wrote: > On Wed, Dec 11, 2024 at 11:32:13AM -0800, Shakeel Butt wrote: > > On Wed, Dec 11, 2024 at 04:50:39PM +0000, Matthew Wilcox wrote: > > > Perhaps you'd be more persuaded by: > > > > > > (a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and > > > that's a pretty significant performance win over calling alloc_pages() > > > in a loop. > > > > > > (b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT > > > set is going to require allocating a memdesc to store the obj_cgroup > > > in, so in the future we'll save an allocation. > > > > > > Your proposed alternative will work and is way less churn. But it's > > > not preparing us for memdescs ;-) > > > > We can make alloc_pages_bulk() work with __GFP_ACCOUNT but your second > > argument is more compelling. > > > > I am trying to think of what will we miss if we remove this per-page > > memcg metadata. One thing I can think of is debugging a live system > > or kdump where I need to track where a given page came from. I think > > Umm, I don't think you know which vmalloc allocation a page came from > today? I've sent patches to add that information before, but they were > rejected. Do you have a link handy for that discussion? > In fact, I don't think we know even _that_ a page belongs to > vmalloc today, do we? Yes, we know that the page is accounted, and > which memcg it belongs to ... but nothing more. Yes you are correct. At the moment it is a guesswork and exhaustive search into multiple sources. > > I actually want to improve this, without adding additional overhead. > What I'm working on right now (before I got waylaid by this bug) is: > > +struct choir { > + struct kref refcount; > + unsigned int nr; > + struct page *pages[] __counted_by(nr); > +}; > > and rewriting vmalloc to be based on choirs instead of its own pages. > One thing I've come to realise today is that the obj_cgroup pointer > needs to be in the choir and not in the vm_struct so that we uncharge the > allocation when the choir refcount drops to 0, not when the allocation > is unmapped. What/who else can take a reference on a choir? > > A regular choir allocation will (today) mark the pages in it as being > allocated to a choir (and thus not having their own refcount / mapcount), > but I'll give vmalloc a way to mark the pages as specifically being > from vmalloc. This sounds good. Thanks for the awesome work.