Re: [RFC PATCH V3 1/1] sched/numa: Fix disjoint set vma scan regression

Raghavendra K T <raghavendra.kt@xxxxxxx> · Mon, 17 Jul 2023 11:53:32 +0530

On 7/16/2023 7:47 PM, Oliver Sang wrote:
hi, Raghavendra K T,

On Wed, Jul 05, 2023 at 11:18:37AM +0530, Raghavendra K T wrote:
On 5/31/2023 9:55 AM, Raghavendra K T wrote:
   With the numa scan enhancements [1], only the threads which had previously
accessed vma are allowed to scan.

While this had improved significant system time overhead, there were corner
cases, which genuinely need some relaxation. For e.g.,

1) Concern raised by PeterZ, where if there are N partition sets of vmas
belonging to tasks, then unfairness in allowing these threads to scan could
potentially amplify the side effect of some of the vmas being left
unscanned.

2) Below reports of LKP numa01 benchmark regression.

Currently this is handled by allowing first two scanning unconditional
as indicated by mm->numa_scan_seq. This is imprecise since for some
benchmark vma scanning might itself start at numa_scan_seq > 2.

Solution:
Allow unconditional scanning of vmas of tasks depending on vma size. This
is achieved by maintaining a per vma scan counter, where

f(allowed_to_scan) = f(scan_counter <  vma_size / scan_size)

Result:
numa01_THREAD_ALLOC result on 6.4.0-rc2 (that has numascan enhancement)
                  	base-numascan	base		base+fix
real    		1m1.507s	1m23.259s	1m2.632s
user    		213m51.336s	251m46.363s	220m35.528s
sys     		3m3.397s	0m12.492s	2m41.393s

numa_hit 		5615517		4560123		4963875
numa_local 		5615505		4560024		4963700
numa_other 		12		99		175
numa_pte_updates 	1822797		493		1559111
numa_hint_faults 	1307113		523		1469031
numa_hint_faults_local 	612617		488		884829
numa_pages_migrated 	694370		35		584202

Summary: Regression in base is recovered by allowing scanning as required.

[1] https://lore.kernel.org/lkml/cover.1677672277.git.raghavendra.kt@xxxxxxx/T/#t

Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
regression.
Reported-by: Aithal Srikanth <sraithal@xxxxxxx>
Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
Closes: https://lore.kernel.org/lkml/db995c11-08ba-9abf-812f-01407f70a5d4@xxxxxxx/T/
Signed-off-by: Raghavendra K T <raghavendra.kt@xxxxxxx>

Hello kernel test robot,

Gentle ping to check if the patch has helped your regression report.

sorry for late.

previously we found a 118.9% regression of autonuma-benchmark.numa01.seconds
on a Cascade Lake, which happened to be converted for other test purposes, so
we cannot test your patch on it again.

however, we also found a 39.3% regression on a Sapphire Rapids test machine:

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-11/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa02_SMT/autonuma-benchmark

ef6a22b70f6d9044 fc137c0ddab29b591db6a091dc6
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
     193.14           +39.2%     268.84        autonuma-benchmark.numa01.seconds
       8.14            -0.7%       8.09        autonuma-benchmark.numa02.seconds

now we tested your v3 patch on it, found regression mostly recovered
(55fd15913b18d6a790c17d947df is just [RFC PATCH V3 1/1] sched/numa: Fix disjoint set vma scan regression)

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-11/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

ef6a22b70f6d9044 55fd15913b18d6a790c17d947df
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
     193.14            +5.8%     204.37 ±  3%  autonuma-benchmark.numa01.seconds
       8.14            -0.9%       8.06        autonuma-benchmark.numa02.seconds

detail comparison as below:

Thank you for the confirmation. So looks like we got back most of the
regression with the patch for numa01_THREAD_ALLOC case.

Andrew, could you please help by picking up this patch unless
Mel , PeterZ do not have any concern about the patch / direction.

(as we note it brings back little bit system time overhead by allowing
some scanning..)

Thanks and Regards
- Raghu