Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

Raghavendra K T <raghavendra.kt@xxxxxxx> · Wed, 13 Sep 2023 11:51:53 +0530

On 9/12/2023 1:20 PM, kernelt test robot wrote:

Hello,

kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:

commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@xxxxxxx/
patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

testcase: autonuma-benchmark
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	iterations: 4x
	test: numa01_THREAD_ALLOC
	cpufreq_governor: performance

hi, Raghu,

the reason there is a separate report for this commit besides
https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@xxxxxxxxx/
is due to bisection nature, for one auto-bisect, we so far only could capture
one commit for performance change.

this auto-bisect is running on another test machine (Sapphire Rapids), and it
happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator
to do the bisect, it finally captured
"[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional"

and from
https://lore.kernel.org/all/acf254e9-0207-7030-131f-8a3f520c657b@xxxxxxx/
I noticed you care more about the performance impact of whole patch set,
so let me give a summary table as below.

firstly, let me give out how we apply your patch again:

68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
167773d1ddb5f sched/numa: Increase tasks' access history
fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
2a806eab1c2e1 sched/numa: Move up the access pid reset logic
2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well

we have below data on this test machine
(full table will be very big, if you want it, please let me know):

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
   2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well")
   2a806eab1c ("sched/numa: Move up the access pid reset logic")
   1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")
   68cfe9439a ("sched/numa: Allow scanning of shared VMAs")

2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64
---------------- --------------------------- --------------------------- ---------------------------
          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
              \          |                \          |                \          |                \
     271.01            +0.8%     273.24            -0.7%     269.00           -26.4%     199.49 ±  3%  autonuma-benchmark.numa01.seconds
      76.28            +0.2%      76.44           -11.7%      67.36 ±  6%     -46.9%      40.49 ±  5%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
       8.11            -0.9%       8.04            -0.7%       8.05            -0.1%       8.10        autonuma-benchmark.numa02.seconds
       1425            +0.7%       1434            -3.1%       1381           -30.1%     996.02 ±  2%  autonuma-benchmark.time.elapsed_time

Thanks for this Summary too.

I think slight additional time overhead from first patch is coming
from additional logic that gets executed before we return from
is_vma_accessed() check as expected.

Regards
- Raghu