On Fri, Mar 14, 2025 at 05:27:33PM +0900, Harry Yoo wrote: > On Thu, Mar 13, 2025 at 12:23:55PM +0100, Mateusz Guzik wrote: > > On Thu, Mar 13, 2025 at 9:59 AM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote: > > > > There is other expensive work which can also be modified this way. > > > > > > Not sure what you're referring to? > > > > > > > There is pgd_alloc/free calls which end up taking the global pgd_list > > lock. Instead, this could be patched to let mm hang out on the get > > skipped by the pgd list iterating machinery as needed. > > Okay, IIUC, you want to remove it from the list in the destructor > and not remove it from the list when freeing it, and the code iterating > over the list needs to verify whether it is still allocated. > > Would this be with SLAB_TYPESAFE_BY_RCU, or by taking a lock within > the destructor? Nah, slipped my mind. a lock needs to be acquired even with SLAB_TYPESAFE_BY_RCU. > The RFC patch [1] that removes the destructor states that "taking a spinlock > in a destructor is a bit risky since the slab allocators may run the > destructors anytime they decide a slab is no longer needed", > but I need to think a bit more about how risky that actually is. > > [1] https://lore.kernel.org/linux-kernel/Pine.LNX.4.64.0705101156190.10663@xxxxxxxxxxxxxxxxxxxxxxxxx/#t > > > > > there may be spurious mm_struct's hanging out and eating pcpu resources. > > > > Something can be added to reclaim those by the pcpu allocator. > > > > > > Not sure if I follow. What do you mean by spurious mm_struct, and how > > > does the pcpu allocator reclaim that? > > > > > > > Suppose a workload was ran which created tons of mm_struct. The > > workload is done and they can be reclaimed, but hang out just in case. > > > > Another workload showed up, but one which wants to do many percpu > > allocs and is not mm_struct-heavy. > > > > In case of resource shortage it would be good if the percpu allocator > > knew how to reclaim the known cached-but-not-used memory instead of > > grabbing new patches. > > > > As for how to get there, so happens the primary consumer (percpu > > counters) already has a global list of all allocated objects. The > > allocator could walk it and reclaim as needed. > > You mean reclaiming per-cpu objects along withthe slab objects that uses them? > That sounds like a new slab shrinker for mm_struct? > > > > > So that's it for making the case, as for the APIs, I think it would be > > > > best if both dtor and ctor accepted a batch of objects to operate on, > > > > but that's a lot of extra churn due to pre-existing ctor users. > > > > > > Why do you want to pass batch of objects, instead of calling one by one > > > for each object when a slab folio is allocated/freed? > > > > > > Is it solely to reduce the overhead of extra function calls when > > > allocating or freeing a slab folio? > > > > > > > The single-threaded overhead is one thing, some contention is another. > > back-to-back acquires are a degenerate case from scalability > > standpoint. > > > > While these codepaths should be executing rarely, there is still no > > need to get spikes when they do. > > > > Even so, that's a minor thing which can be patched up later -- even a > > ctor/dtor pair which operates one obj at a time is almost entirety of > > the improvement. > > Hmm, but I think even a ctor/dtor pair that operations on one object at > a time requires changing the semantic of how the ctor works :'( > > For now the constructor is not expected to fail, but looking at > mm_init()... it can fail. There is no way to let slab allocator know > that it failed. > > Looks like some churn is needed anyway? > > > > > ACHTUNG: I think this particular usage would still want some buy in > > > > from the mm folk and at least Dennis (the percpu allocator > > > > maintainer), but one has to start somewhere. There were 2 different > > > > patchsets posted to move rss counters away from the current pcpu > > > > scheme, but both had different tradeoffs and ultimately died off. > > > > > > > > Should someone(tm) commit to sorting this out, I'll handle the percpu > > > > thing. There are some other tweaks warranted here (e.g., depessimizing > > > > the rss counter validation loop at exit). > > > > > > > > So what do you think? > > > > > > I'd love to take the project and work on it, it makes sense to revive > > > the destructor feature if that turns out to be beneficial. > > > > > > I'll do that the slab part. > > > > Cool, thanks. > > -- > Cheers, > Harry -- Cheers, Harry