I am posting the summary of numa balancing improvements tried out. (Intention is RFC and revisiting these in future when some one sees potential benefits with PATCH1 and PATCH2). PATCH3 has more potential for workloads that needs aggressive scanning but may need migration ratelimiting. Pathset details: ================== PATCH 1. Increase the number of access PID (information of tasks accessing VMA) history windows from 2 to 4 Based on PeterZ's suggestion/patch. Rationale: - Increases the depth of historical access of tasks - Get a better view of hot VMAs - Get a better view of VMA which are widely shared amongst tasks with that we can take better decision in choosing the VMAs that needs to be scanned for introducing PROT_NONE. PATCH 2. Increase the number of bit used to map tasks accessing VMA from 64 to 128bit Based on suggestion by Ingo Rationale: Decrease the number of collisions (false positive), while whole information still fits in a cacheline This is potentially helpful when workload involve more threads and thus, - unnecessarily do VMA scan. - create contention in scan path. PATCH 3. Change the notion of scanning 256MB limit per scan to 64k PTE scan (for 4k). Extend the same logic to hugepages / THP pages. Based on suggestion by Mel Rationale: This helps to cover more memory especially when THP is involved or a hugepage is involved. PS: Please note all 3 are independent patches. Apologies in advance if patchset confuses any patching script. Also more comment/details will be added for patches of interest. Summary of results: ================== PATCH1 and PATCH2 are giving benefit in some cases I ran but they may still need more convincing usecase / results (as on 6.9+ kernel). PATCH3: Some benchmarks such as XSBench Hashjoin are benefiting from more scanning But microbenchmarks (such as allocate on one node fault from other node to see how fast migration happen), suffer because of aggressive migration overhead. Overall if we combine ratelimiting of migration (similar to CXL) or tune the scan rate when it is not necessary to scan (for e.g., I still see VMA scanning does not slow even when rate of migration slowed down or all migrations completed.) Change stat for each of the patches ====================== PATCH 1: Raghavendra K T (1): sched/numa: Hot VMA and shared VMA optimization include/linux/mm.h | 12 ++++++--- include/linux/mm_types.h | 11 +++++--- kernel/sched/fair.c | 58 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 69 insertions(+), 12 deletions(-) base-commit: b0546776ad3f332e215cebc0b063ba4351971cca ============================ PATCH 2: Raghavendra K T (1): sched/numa: Increase the VMA accessing PID bits include/linux/mm.h | 29 ++++++++++++++++++++++++++--- include/linux/mm_types.h | 7 ++++++- kernel/sched/fair.c | 21 ++++++++++++++++----- 3 files changed, 48 insertions(+), 9 deletions(-) base-commit: b0546776ad3f332e215cebc0b063ba4351971cca =========================== PATCH 3: Raghavendra K T (1): sched/numa: Convert 256MB VMA scan limit notion include/linux/hugetlb.h | 3 +- include/linux/mm.h | 16 +++++++- kernel/sched/fair.c | 15 ++++--- mm/hugetlb.c | 9 +++++ mm/mempolicy.c | 11 +++++- mm/mprotect.c | 87 +++++++++++++++++++++++++++++++++-------- 6 files changed, 115 insertions(+), 26 deletions(-) base-commit: b0546776ad3f332e215cebc0b063ba4351971cca -- 2.34.1