On Thu, 14 Sep 2023, Feng Tang wrote:
One reason I wanted to revisit the MIN_PARTIAL is, it was changed from 2 to 5 in 2007 by Christoph, in commit 76be895001f2 ("SLUB: Improve hackbench speed"), the system has been much huger since then. Currently while a per-cpu partial can already have 5 or more slabs, the limit for a node with possible 100+ CPU could be reconsidered.
Well the trick that I keep using in large systems with lots of memory is to use huge page sized page allocation. The applications on those already are using the same page size. Doing so usually removes a lot of overhead and speeds up things significantly.
Try booting with "slab_min_order=9"