On 9/12/2023 1:20 PM, kernelt test robot wrote:
Hello, kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on: commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic") url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805 patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@xxxxxxx/ patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic testcase: autonuma-benchmark test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory parameters: iterations: 4x test: numa01_THREAD_ALLOC cpufreq_governor: performance hi, Raghu, the reason there is a separate report for this commit besides https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@xxxxxxxxx/ is due to bisection nature, for one auto-bisect, we so far only could capture one commit for performance change. this auto-bisect is running on another test machine (Sapphire Rapids), and it happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator to do the bisect, it finally captured "[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional" and from https://lore.kernel.org/all/acf254e9-0207-7030-131f-8a3f520c657b@xxxxxxx/ I noticed you care more about the performance impact of whole patch set, so let me give a summary table as below. firstly, let me give out how we apply your patch again: 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned 167773d1ddb5f sched/numa: Increase tasks' access history fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic 2a806eab1c2e1 sched/numa: Move up the access pid reset logic 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well we have below data on this test machine (full table will be very big, if you want it, please let me know): ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark commit: 2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well") 2a806eab1c ("sched/numa: Move up the access pid reset logic") 1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic") 68cfe9439a ("sched/numa: Allow scanning of shared VMAs") 2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 271.01 +0.8% 273.24 -0.7% 269.00 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds 76.28 +0.2% 76.44 -11.7% 67.36 ± 6% -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 8.11 -0.9% 8.04 -0.7% 8.05 -0.1% 8.10 autonuma-benchmark.numa02.seconds 1425 +0.7% 1434 -3.1% 1381 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time
Thanks for this Summary too. I think slight additional time overhead from first patch is coming from additional logic that gets executed before we return from is_vma_accessed() check as expected. Regards - Raghu