Re: a case for a destructor for slub: mm_struct

Harry Yoo <harry.yoo@xxxxxxxxxx> · Fri, 14 Mar 2025 17:27:33 +0900

On Thu, Mar 13, 2025 at 12:23:55PM +0100, Mateusz Guzik wrote:
> On Thu, Mar 13, 2025 at 9:59 AM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote:
> > > There is other expensive work which can also be modified this way.
> >
> > Not sure what you're referring to?
> >
> 
> There is pgd_alloc/free calls which end up taking the global pgd_list
> lock. Instead, this could be patched to let mm hang out on the get
> skipped by the pgd list iterating machinery as needed.

Okay, IIUC, you want to remove it from the list in the destructor
and not remove it from the list when freeing it, and the code iterating
over the list needs to verify whether it is still allocated.

Would this be with SLAB_TYPESAFE_BY_RCU, or by taking a lock within
the destructor?

The RFC patch [1] that removes the destructor states that "taking a spinlock
in a destructor is a bit risky since the slab allocators may run the
destructors anytime they decide a slab is no longer needed",
but I need to think a bit more about how risky that actually is.

[1] https://lore.kernel.org/linux-kernel/Pine.LNX.4.64.0705101156190.10663@xxxxxxxxxxxxxxxxxxxxxxxxx/#t

> > > there may be spurious mm_struct's hanging out and eating pcpu resources.
> > > Something can be added to reclaim those by the pcpu allocator.
> >
> > Not sure if I follow. What do you mean by spurious mm_struct, and how
> > does the pcpu allocator reclaim that?
> >
> 
> Suppose a workload was ran which created tons of mm_struct. The
> workload is done and they can be reclaimed, but hang out just in case.
>
> Another workload showed up, but one which wants to do many percpu
> allocs and is not mm_struct-heavy.
> 
> In case of resource shortage it would be good if the percpu allocator
> knew how to reclaim the known cached-but-not-used memory instead of
> grabbing new patches.
> 
> As for how to get there, so happens the primary consumer (percpu
> counters) already has a global list of all allocated objects. The
> allocator could walk it and reclaim as needed.

You mean reclaiming per-cpu objects along withthe slab objects that uses them?
That sounds like a new slab shrinker for mm_struct?

> > > So that's it for making the case, as for the APIs, I think it would be
> > > best if both dtor and ctor accepted a batch of objects to operate on,
> > > but that's a lot of extra churn due to pre-existing ctor users.
> >
> > Why do you want to pass batch of objects, instead of calling one by one
> > for each object when a slab folio is allocated/freed?
> >
> > Is it solely to reduce the overhead of extra function calls when
> > allocating or freeing a slab folio?
> >
> 
> The single-threaded overhead is one thing, some contention is another.
> back-to-back acquires are a degenerate case from scalability
> standpoint.
> 
> While these codepaths should be executing rarely, there is still no
> need to get spikes when they do.
> 
> Even so, that's a minor thing which can be patched up later -- even a
> ctor/dtor pair which operates one obj at a time is almost entirety of
> the improvement.

Hmm, but I think even a ctor/dtor pair that operations on one object at
a time requires changing the semantic of how the ctor works :'(

For now the constructor is not expected to fail, but looking at
mm_init()... it can fail. There is no way to let slab allocator know
that it failed.

Looks like some churn is needed anyway?

> > > ACHTUNG: I think this particular usage would still want some buy in
> > > from the mm folk and at least Dennis (the percpu allocator
> > > maintainer), but one has to start somewhere. There were 2 different
> > > patchsets posted to move rss counters away from the current pcpu
> > > scheme, but both had different tradeoffs and ultimately died off.
> > >
> > > Should someone(tm) commit to sorting this out, I'll handle the percpu
> > > thing. There are some other tweaks warranted here (e.g., depessimizing
> > > the rss counter validation loop at exit).
> > >
> > > So what do you think?
> >
> > I'd love to take the project and work on it, it makes sense to revive
> > the destructor feature if that turns out to be beneficial.
> >
> > I'll do that the slab part.
> 
> Cool, thanks.

-- 
Cheers,
Harry