Hello folks, This series is motivated by kernel test bot report [1] on Jay's patch that modifies slab order. While the patch was not merged and not in the final form, I think it was a good lesson that changing slab order has more impacts on performance than we expected. While inspecting the report, I found some potential points to improve SLUB. [2] It's _potential_ because it shows no improvements on hackbench. but I believe more realistic workloads would benefit from this. Due to lack of resources and lack of my understanding of *realistic* workloads, I am asking you to help evaluating this together. It only consists of two patches. Patch #1 addresses inaccuracy in SLUB's heuristic, which can negatively affect workloads' performance when large folios are not available from buddy. Patch #2 changes SLUB's behavior when there are no slabs available on the local node's partial slab list, increasing NUMA locality when there are available memory (without reclamation) on the local node from buddy. This is early state, but I think it's a good enough to start discussion. Any feedbacks and ideas are welcome. Thank you in advance! Hyeonggon https://lore.kernel.org/linux-mm/202307172140.3b34825a-oliver.sang@xxxxxxxxx [1] https://lore.kernel.org/linux-mm/CAB=+i9S6Ykp90+4N1kCE=hiTJTE4wzJDi8k5pBjjO_3sf0aeqg@xxxxxxxxxxxxxx [2] Hyeonggon Yoo (2): Revert "mm, slub: change percpu partial accounting from objects to pages" mm/slub: prefer NUMA locality over slight memory saving on NUMA machines include/linux/slub_def.h | 2 -- mm/slab.h | 6 ++++ mm/slub.c | 76 ++++++++++++++++++++++++++-------------- 3 files changed, 55 insertions(+), 29 deletions(-) -- 2.41.0