Hello, kernel test robot noticed a 5.7% regression of will-it-scale.per_process_ops on: commit: 3b7734aa8458b62ecbfd785ca7918e831565006e ("[PATCH mm-unstable v3 6/6] mm/mglru: rework workingset protection") url: https://github.com/intel-lab-lkp/linux/commits/Yu-Zhao/mm-mglru-clean-up-workingset/20241208-061714 base: v6.13-rc1 patch link: https://lore.kernel.org/all/20241207221522.2250311-7-yuzhao@xxxxxxxxxx/ patch subject: [PATCH mm-unstable v3 6/6] mm/mglru: rework workingset protection testcase: will-it-scale config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 104 threads 2 sockets (Skylake) with 192G memory parameters: nr_task: 100% mode: process test: pread2 cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202412231601.f1eb8f84-lkp@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20241223/202412231601.f1eb8f84-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pread2/will-it-scale commit: 4a202aca7c ("mm/mglru: rework refault detection") 3b7734aa84 ("mm/mglru: rework workingset protection") 4a202aca7c7d9f99 3b7734aa8458b62ecbfd785ca79 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.03 ± 3% -0.1 0.92 ± 5% mpstat.cpu.all.usr% 0.29 ± 14% +20.8% 0.35 ± 7% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 1.02 ± 21% +50.7% 1.54 ± 23% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 0.01 ± 50% -66.9% 0.00 ± 82% perf-stat.i.major-faults 0.01 ± 50% -73.6% 0.00 ±112% perf-stat.ps.major-faults 335982 -60.7% 132060 ± 15% proc-vmstat.nr_active_anon 335982 -60.7% 132060 ± 15% proc-vmstat.nr_zone_active_anon 1343709 -60.7% 528460 ± 15% meminfo.Active 1343709 -60.7% 528460 ± 15% meminfo.Active(anon) 259.96 +3.2e+05% 821511 ± 11% meminfo.Inactive 1401961 -5.7% 1321692 ± 2% will-it-scale.104.processes 13479 -5.7% 12708 ± 2% will-it-scale.per_process_ops 1401961 -5.7% 1321692 ± 2% will-it-scale.workload 138691 ± 43% -75.8% 33574 ± 55% numa-vmstat.node0.nr_active_anon 138691 ± 43% -75.8% 33574 ± 55% numa-vmstat.node0.nr_zone_active_anon 197311 ± 30% -50.1% 98494 ± 18% numa-vmstat.node1.nr_active_anon 197311 ± 30% -50.1% 98494 ± 18% numa-vmstat.node1.nr_zone_active_anon 554600 ± 43% -75.8% 134360 ± 55% numa-meminfo.node0.Active 554600 ± 43% -75.8% 134360 ± 55% numa-meminfo.node0.Active(anon) 173.31 ± 70% +1.4e+05% 247821 ± 50% numa-meminfo.node0.Inactive 789291 ± 30% -50.1% 394029 ± 18% numa-meminfo.node1.Active 789291 ± 30% -50.1% 394029 ± 18% numa-meminfo.node1.Active(anon) 86.66 ±141% +6.6e+05% 573998 ± 27% numa-meminfo.node1.Inactive 38.95 -0.9 38.09 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read 38.83 -0.9 37.97 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_file_read_iter 39.70 -0.8 38.86 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64 41.03 -0.8 40.26 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64 0.91 +0.0 0.95 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64 53.14 +0.5 53.66 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_wake_bit.shmem_file_read_iter.vfs_read 53.24 +0.5 53.76 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_wake_bit.shmem_file_read_iter.vfs_read.__x64_sys_pread64 53.84 +0.5 54.38 perf-profile.calltrace.cycles-pp.folio_wake_bit.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64 38.96 -0.9 38.09 perf-profile.children.cycles-pp._raw_spin_lock_irq 39.71 -0.8 38.87 perf-profile.children.cycles-pp.folio_wait_bit_common 41.04 -0.8 40.26 perf-profile.children.cycles-pp.shmem_get_folio_gfp 92.00 -0.3 91.67 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 0.22 -0.0 0.18 ± 3% perf-profile.children.cycles-pp._copy_to_iter 0.22 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.copy_page_to_iter 0.20 ± 2% -0.0 0.16 ± 4% perf-profile.children.cycles-pp.rep_movs_alternative 0.91 +0.0 0.96 perf-profile.children.cycles-pp.filemap_get_entry 0.00 +0.3 0.35 perf-profile.children.cycles-pp.folio_mark_accessed 53.27 +0.5 53.80 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 53.86 +0.5 54.40 perf-profile.children.cycles-pp.folio_wake_bit 92.00 -0.3 91.67 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.19 -0.0 0.16 ± 3% perf-profile.self.cycles-pp.rep_movs_alternative 0.41 +0.0 0.44 perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.37 ± 2% +0.0 0.40 perf-profile.self.cycles-pp.folio_wait_bit_common 0.90 +0.0 0.94 perf-profile.self.cycles-pp.filemap_get_entry 0.61 +0.1 0.68 perf-profile.self.cycles-pp.shmem_file_read_iter 0.00 +0.3 0.34 ± 2% perf-profile.self.cycles-pp.folio_mark_accessed Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki