Re: a case for a destructor for slub: mm_struct

Harry Yoo <harry.yoo@xxxxxxxxxx> · Fri, 14 Mar 2025 17:34:31 +0900

On Fri, Mar 14, 2025 at 05:27:33PM +0900, Harry Yoo wrote:
> On Thu, Mar 13, 2025 at 12:23:55PM +0100, Mateusz Guzik wrote:
> > On Thu, Mar 13, 2025 at 9:59 AM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote:
> > > > There is other expensive work which can also be modified this way.
> > >
> > > Not sure what you're referring to?
> > >
> > 
> > There is pgd_alloc/free calls which end up taking the global pgd_list
> > lock. Instead, this could be patched to let mm hang out on the get
> > skipped by the pgd list iterating machinery as needed.
> 
> Okay, IIUC, you want to remove it from the list in the destructor
> and not remove it from the list when freeing it, and the code iterating
> over the list needs to verify whether it is still allocated.
> 
> Would this be with SLAB_TYPESAFE_BY_RCU, or by taking a lock within
> the destructor?

Nah, slipped my mind. a lock needs to be acquired
even with SLAB_TYPESAFE_BY_RCU.

> The RFC patch [1] that removes the destructor states that "taking a spinlock
> in a destructor is a bit risky since the slab allocators may run the
> destructors anytime they decide a slab is no longer needed",
> but I need to think a bit more about how risky that actually is.
> 
> [1] https://lore.kernel.org/linux-kernel/Pine.LNX.4.64.0705101156190.10663@xxxxxxxxxxxxxxxxxxxxxxxxx/#t
> 
> > > > there may be spurious mm_struct's hanging out and eating pcpu resources.
> > > > Something can be added to reclaim those by the pcpu allocator.
> > >
> > > Not sure if I follow. What do you mean by spurious mm_struct, and how
> > > does the pcpu allocator reclaim that?
> > >
> > 
> > Suppose a workload was ran which created tons of mm_struct. The
> > workload is done and they can be reclaimed, but hang out just in case.
> >
> > Another workload showed up, but one which wants to do many percpu
> > allocs and is not mm_struct-heavy.
> > 
> > In case of resource shortage it would be good if the percpu allocator
> > knew how to reclaim the known cached-but-not-used memory instead of
> > grabbing new patches.
> > 
> > As for how to get there, so happens the primary consumer (percpu
> > counters) already has a global list of all allocated objects. The
> > allocator could walk it and reclaim as needed.
> 
> You mean reclaiming per-cpu objects along withthe slab objects that uses them?
> That sounds like a new slab shrinker for mm_struct?
> 
> > > > So that's it for making the case, as for the APIs, I think it would be
> > > > best if both dtor and ctor accepted a batch of objects to operate on,
> > > > but that's a lot of extra churn due to pre-existing ctor users.
> > >
> > > Why do you want to pass batch of objects, instead of calling one by one
> > > for each object when a slab folio is allocated/freed?
> > >
> > > Is it solely to reduce the overhead of extra function calls when
> > > allocating or freeing a slab folio?
> > >
> > 
> > The single-threaded overhead is one thing, some contention is another.
> > back-to-back acquires are a degenerate case from scalability
> > standpoint.
> > 
> > While these codepaths should be executing rarely, there is still no
> > need to get spikes when they do.
> > 
> > Even so, that's a minor thing which can be patched up later -- even a
> > ctor/dtor pair which operates one obj at a time is almost entirety of
> > the improvement.
> 
> Hmm, but I think even a ctor/dtor pair that operations on one object at
> a time requires changing the semantic of how the ctor works :'(
> 
> For now the constructor is not expected to fail, but looking at
> mm_init()... it can fail. There is no way to let slab allocator know
> that it failed.
> 
> Looks like some churn is needed anyway?
> 
> > > > ACHTUNG: I think this particular usage would still want some buy in
> > > > from the mm folk and at least Dennis (the percpu allocator
> > > > maintainer), but one has to start somewhere. There were 2 different
> > > > patchsets posted to move rss counters away from the current pcpu
> > > > scheme, but both had different tradeoffs and ultimately died off.
> > > >
> > > > Should someone(tm) commit to sorting this out, I'll handle the percpu
> > > > thing. There are some other tweaks warranted here (e.g., depessimizing
> > > > the rss counter validation loop at exit).
> > > >
> > > > So what do you think?
> > >
> > > I'd love to take the project and work on it, it makes sense to revive
> > > the destructor feature if that turns out to be beneficial.
> > >
> > > I'll do that the slab part.
> > 
> > Cool, thanks.
> 
> -- 
> Cheers,
> Harry

-- 
Cheers,
Harry