On 6/25/2024 7:50 PM, Chen Yu wrote:
Hi Raghavendra,
On 2024-03-22 at 19:11:12 +0530, Raghavendra K T wrote:
Optimizations are based on history of PIDs accessing VMA.
- Increase tasks' access history windows (PeterZ) from 2 to 4.
( This patch is from Peter Zijlstra <peterz@xxxxxxxxxxxxx>)
Idea: A task is allowed to scan a VMA if:
- VMA was very recently accessed as indicated by the latest
access PIDs information (hot VMA).
- VMA is shared by more than 2 tasks. Here whole history of VMA's
access PIDs is considered using bitmap_weight().
Signed-off-by: Raghavendra K T <raghavendra.kt@xxxxxxx>
---
I will split the patset and post if we find this pathset useful
going further. First patch is from PeterZ.
This is a good direction I think. We did an initial test using autonumabench
THREADLOCAL on a 240 CPUs 2 nodes system. It seems that this patch does not
show obvious difference, but it shows a more stable result(less run-to-run
variance). We'll enable the Sub-Numa-Cluster to see if there is any difference.
My understanding is that, if we can extend the NR_ACCESS_PID_HIST further,
the THREADLOCAL could see more benefits, as each thread has its own VMA. Or maybe
make the length of VMA access history adaptive(rather than a fixed 4) could be
more flexible.
numa_scan_orig numa_scan_4_history
Min syst-NUMA01_THREADLOCAL 388.47 ( 0.00%) 397.43 ( -2.31%)
Min elsp-NUMA01_THREADLOCAL 40.27 ( 0.00%) 38.94 ( 3.30%)
Amean syst-NUMA01_THREADLOCAL 467.62 ( 0.00%) 459.10 ( 1.82%)
Amean elsp-NUMA01_THREADLOCAL 42.20 ( 0.00%) 44.84 ( -6.26%)
Stddev syst-NUMA01_THREADLOCAL 74.11 ( 0.00%) 60.90 ( 17.81%)
CoeffVar syst-NUMA01_THREADLOCAL 15.85 ( 0.00%) 13.27 ( 16.29%)
Max syst-NUMA01_THREADLOCAL 535.36 ( 0.00%) 519.21 ( 3.02%)
Max elsp-NUMA01_THREADLOCAL 43.96 ( 0.00%) 56.33 ( -28.14%)
BAmean-50 syst-NUMA01_THREADLOCAL 388.47 ( 0.00%) 397.43 ( -2.31%)
BAmean-50 elsp-NUMA01_THREADLOCAL 40.27 ( 0.00%) 38.94 ( 3.30%)
BAmean-95 syst-NUMA01_THREADLOCAL 433.75 ( 0.00%) 429.05 ( 1.08%)
BAmean-95 elsp-NUMA01_THREADLOCAL 41.31 ( 0.00%) 39.09 ( 5.39%)
BAmean-99 syst-NUMA01_THREADLOCAL 433.75 ( 0.00%) 429.05 ( 1.08%)
BAmean-99 elsp-NUMA01_THREADLOCAL 41.31 ( 0.00%) 39.09 ( 5.39%)
Thanks for the test and report. I will split the patches and also test
for N=6,8.
(on top of your patch perhaps to make sure we have benefits further).
Making adaptive may be little difficult. How to assess which size is
doing better dynamically seems to be little hard to imagine for me. (/me
Need to think here)
Thanks and Regards
- Raghu