I'm looking for someone(tm) willing to implement a destructor for slub. Currently SLUB only supports a constructor, a callback to use when first creating an object, but there is no matching callback for getting rid of it. The pair would come in handy when a frequently allocated and freed object performs the same expensive work each time. The specific usage I have in mind is mm_struct -- it gets allocated on both each fork and exec and suffers global serialization several times. The primary thing I'm looking to handle this way is cid and percpu counter allocation, both going to down to the percpu allocator which only has a global lock. The problem is exacerbated as it happens back-to-back, so that's 4 acquires per lifetime cycle (alloc and free). There is other expensive work which can also be modified this way. I recognize something like this would pose a tradeoff in terms of memory usage, but I don't believe it's a big deal. If you have a mm_struct hanging out, you are going to need to have the percpu memory up for grabs to make any use of it anyway. Granted, there may be spurious mm_struct's hanging out and eating pcpu resources. Something can be added to reclaim those by the pcpu allocator. So that's it for making the case, as for the APIs, I think it would be best if both dtor and ctor accepted a batch of objects to operate on, but that's a lot of extra churn due to pre-existing ctor users. ACHTUNG: I think this particular usage would still want some buy in from the mm folk and at least Dennis (the percpu allocator maintainer), but one has to start somewhere. There were 2 different patchsets posted to move rss counters away from the current pcpu scheme, but both had different tradeoffs and ultimately died off. Should someone(tm) commit to sorting this out, I'll handle the percpu thing. There are some other tweaks warranted here (e.g., depessimizing the rss counter validation loop at exit). So what do you think? In order to bench yourself, you can grab code from here: http://apollo.backplane.com/DFlyMisc/doexec.c $ cc -static -O2 -o static-doexec doexec.c $ ./static-doexec $(nproc) I check spinlock problems with: bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }' -- Mateusz Guzik <mjguzik gmail.com>