On Fri, Oct 20, 2023 at 10:21:28PM +0800, Sang, Oliver wrote: > > > Hello, > > kernel test robot noticed a -3.7% regression of will-it-scale.per_process_ops on: I was surprised to see this initially, as I know this patch which only affects the order of a few slabs in a certain size range, and 0Day has enabled the 64 bytes alignment for function address. One only big difference of perf hot spot is > 19.62 +1.9 21.54 perf-profile.self.cycles-pp.__fget_light but its code flow and data doesn't have much to do with the commit. I manually run the test case, and didn't see the affected slabs actively used by checking 'slabtop' Then I hacked to move slub.c to a very late position when linking kernel image, so that very few other kernel modules' alignment will be affected, and the regression is gone. So this seems to be another strange perf change caused by text code alignment changes. similar to another recent case of MCE patch https://lore.kernel.org/lkml/202310111637.dee70328-oliver.sang@xxxxxxxxx/ Thanks, Feng > > > commit: 5886fc82b6e3166dd1ba876809888fc39028d626 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > testcase: will-it-scale > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > parameters: > > nr_task: 50% > mode: process > test: poll2 > cpufreq_governor: performance > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202310202221.fdbcbe56-oliver.sang@xxxxxxxxx > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20231020/202310202221.fdbcbe56-oliver.sang@xxxxxxxxx > > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale > > commit: > 0fe2735d5e ("mm/slub: remove min_objects loop from calculate_order()") > 5886fc82b6 ("mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()") > > 0fe2735d5e2e0060 5886fc82b6e3166dd1ba8768098 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 28.08 +1.1% 28.40 boot-time.dhcp > 6.17 ± 10% -15.4% 5.22 ± 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity > 6.17 ± 10% -15.4% 5.22 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity > 98376568 -3.7% 94713387 will-it-scale.112.processes > 878361 -3.7% 845654 will-it-scale.per_process_ops > 98376568 -3.7% 94713387 will-it-scale.workload > 81444 +4.8% 85370 proc-vmstat.nr_active_anon > 85071 +4.8% 89137 proc-vmstat.nr_shmem > 81444 +4.8% 85370 proc-vmstat.nr_zone_active_anon > 79205 +3.8% 82205 proc-vmstat.pgactivate > 5.18 -0.4 4.79 ± 2% perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 > 2.18 -0.2 2.03 ± 2% perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe > 2.29 -0.1 2.19 perf-profile.calltrace.cycles-pp.__entry_text_start.__poll > 0.83 -0.1 0.76 ± 3% perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64 > 0.90 -0.1 0.84 ± 2% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64 > 0.66 ± 2% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll > 0.66 -0.0 0.61 ± 3% perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe > 47.75 +1.3 49.07 perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe > 22.63 +2.1 24.74 perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 > 5.17 -0.4 4.78 ± 2% perf-profile.children.cycles-pp.__fdget > 2.35 -0.2 2.18 ± 2% perf-profile.children.cycles-pp.__check_object_size > 0.84 -0.1 0.77 ± 3% perf-profile.children.cycles-pp.__check_heap_object > 1.48 -0.1 1.41 perf-profile.children.cycles-pp.__entry_text_start > 0.94 -0.1 0.87 ± 2% perf-profile.children.cycles-pp.check_heap_object > 1.57 -0.1 1.51 ± 2% perf-profile.children.cycles-pp.__kmalloc > 0.68 ± 2% -0.1 0.63 perf-profile.children.cycles-pp.__virt_addr_valid > 0.66 -0.0 0.61 ± 3% perf-profile.children.cycles-pp.kfree > 0.83 -0.0 0.79 ± 2% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 22.29 +1.7 24.01 perf-profile.children.cycles-pp.__fget_light > 48.12 +1.7 49.84 perf-profile.children.cycles-pp.do_poll > 7.66 -0.4 7.22 perf-profile.self.cycles-pp.do_sys_poll > 2.58 ± 2% -0.2 2.38 ± 2% perf-profile.self.cycles-pp.__fdget > 2.23 -0.1 2.12 ± 2% perf-profile.self.cycles-pp._copy_from_user > 1.07 ± 3% -0.1 0.98 ± 2% perf-profile.self.cycles-pp.__poll > 0.84 -0.1 0.77 ± 2% perf-profile.self.cycles-pp.__check_heap_object > 0.66 ± 2% -0.1 0.61 ± 2% perf-profile.self.cycles-pp.__virt_addr_valid > 0.65 -0.0 0.61 ± 3% perf-profile.self.cycles-pp.kfree > 0.80 -0.0 0.76 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.67 ± 2% -0.0 0.64 perf-profile.self.cycles-pp.__entry_text_start > 19.62 +1.9 21.54 perf-profile.self.cycles-pp.__fget_light > 2.225e+11 -3.7% 2.143e+11 perf-stat.i.branch-instructions > 5.573e+08 -3.2% 5.393e+08 perf-stat.i.branch-misses > 2332742 ± 2% -6.6% 2179079 perf-stat.i.cache-misses > 13799351 -3.9% 13256775 perf-stat.i.cache-references > 0.32 +5.0% 0.34 perf-stat.i.cpi > 3.863e+11 +1.2% 3.908e+11 perf-stat.i.cpu-cycles > 174616 ± 3% +9.1% 190529 ± 2% perf-stat.i.cycles-between-cache-misses > 2.777e+11 -3.7% 2.675e+11 perf-stat.i.dTLB-loads > 1.689e+11 -3.7% 1.627e+11 perf-stat.i.dTLB-stores > 50719249 -2.8% 49295350 perf-stat.i.iTLB-load-misses > 2674672 -14.5% 2285560 perf-stat.i.iTLB-loads > 1.206e+12 -3.7% 1.161e+12 perf-stat.i.instructions > 3.12 -4.8% 2.97 perf-stat.i.ipc > 1.24 -4.0% 1.19 perf-stat.i.metric.G/sec > 1.72 +1.1% 1.74 perf-stat.i.metric.GHz > 76.66 -5.6% 72.34 perf-stat.i.metric.K/sec > 1743 -3.5% 1683 perf-stat.i.metric.M/sec > 594324 -2.9% 576831 perf-stat.i.node-load-misses > 0.32 +5.0% 0.34 perf-stat.overall.cpi > 165074 ± 2% +8.2% 178683 perf-stat.overall.cycles-between-cache-misses > 3.12 -4.8% 2.97 perf-stat.overall.ipc > 2.217e+11 -3.7% 2.135e+11 perf-stat.ps.branch-instructions > 5.554e+08 -3.2% 5.375e+08 perf-stat.ps.branch-misses > 2333651 ± 2% -6.6% 2179985 perf-stat.ps.cache-misses > 13948192 -3.9% 13410551 perf-stat.ps.cache-references > 3.849e+11 +1.2% 3.894e+11 perf-stat.ps.cpu-cycles > 2.767e+11 -3.7% 2.665e+11 perf-stat.ps.dTLB-loads > 1.683e+11 -3.7% 1.621e+11 perf-stat.ps.dTLB-stores > 50558427 -2.8% 49131845 perf-stat.ps.iTLB-load-misses > 2664632 -14.5% 2276961 ± 2% perf-stat.ps.iTLB-loads > 1.201e+12 -3.7% 1.157e+12 perf-stat.ps.instructions > 592459 -2.9% 575320 perf-stat.ps.node-load-misses > 3.621e+14 -3.6% 3.492e+14 perf-stat.total.instructions