Hello, kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: aim9 test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory parameters: testtime: 300s test: disk_src cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+-------------------------------------------------------------------------------------------------+ | testcase: change | aim9: aim9.disk_src.ops_per_sec -14.6% regression | | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory | | test parameters | cpufreq_governor=performance | | | test=all | | | testtime=5s | +------------------+-------------------------------------------------------------------------------------------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230908/202309081306.3ecb3734-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s commit: 23a31d8764 ("shmem: Refactor shmem_symlink()") a2e459555c ("shmem: stable directory offsets") 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.26 ± 9% +0.1 0.36 ± 2% mpstat.cpu.all.soft% 0.61 -0.1 0.52 mpstat.cpu.all.usr% 0.16 ± 10% -18.9% 0.13 ± 12% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 0.04 ± 7% +1802.4% 0.78 ±115% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 202424 -19.0% 163868 aim9.disk_src.ops_per_sec 94.83 -4.2% 90.83 aim9.time.percent_of_cpu_this_job_got 73.62 -17.6% 60.69 aim9.time.user_time 23541 +6.5% 25074 proc-vmstat.nr_slab_reclaimable 1437319 ± 24% +377.6% 6864201 proc-vmstat.numa_hit 1387016 ± 25% +391.4% 6815486 proc-vmstat.numa_local 4864362 ± 34% +453.6% 26931180 proc-vmstat.pgalloc_normal 4835960 ± 34% +455.4% 26856610 proc-vmstat.pgfree 538959 ± 24% -23.2% 414090 sched_debug.cfs_rq:/.load.max 130191 ± 14% -13.3% 112846 ± 6% sched_debug.cfs_rq:/.load.stddev 116849 ± 27% -51.2% 56995 ± 20% sched_debug.cfs_rq:/.min_vruntime.max 1223 ±191% -897.4% -9754 sched_debug.cfs_rq:/.spread0.avg 107969 ± 29% -65.3% 37448 ± 39% sched_debug.cfs_rq:/.spread0.max 55209 ± 14% -21.8% 43154 ± 14% sched_debug.cpu.nr_switches.max 11.21 +23.7% 13.87 perf-stat.i.MPKI 7.223e+08 -4.4% 6.907e+08 perf-stat.i.branch-instructions 2.67 +0.2 2.88 perf-stat.i.branch-miss-rate% 19988363 +2.8% 20539702 perf-stat.i.branch-misses 17.36 -2.8 14.59 perf-stat.i.cache-miss-rate% 40733859 +19.5% 48659982 perf-stat.i.cache-references 1.76 +3.5% 1.82 perf-stat.i.cpi 55.21 +5.4% 58.21 ± 2% perf-stat.i.cpu-migrations 1.01e+09 -3.8% 9.719e+08 perf-stat.i.dTLB-loads 0.26 ± 4% -0.0 0.23 ± 3% perf-stat.i.dTLB-store-miss-rate% 2166022 ± 4% -6.9% 2015917 ± 3% perf-stat.i.dTLB-store-misses 8.503e+08 +5.5% 8.968e+08 perf-stat.i.dTLB-stores 69.22 ± 4% +6.4 75.60 perf-stat.i.iTLB-load-miss-rate% 316455 ± 12% -31.6% 216531 ± 3% perf-stat.i.iTLB-loads 3.722e+09 -3.1% 3.608e+09 perf-stat.i.instructions 0.57 -3.3% 0.55 perf-stat.i.ipc 865.04 -10.4% 775.02 ± 3% perf-stat.i.metric.K/sec 47.51 -2.1 45.37 perf-stat.i.node-load-miss-rate% 106705 ± 3% +14.8% 122490 ± 5% perf-stat.i.node-loads 107169 ± 4% +29.0% 138208 ± 7% perf-stat.i.node-stores 10.94 +23.3% 13.49 perf-stat.overall.MPKI 2.77 +0.2 2.97 perf-stat.overall.branch-miss-rate% 17.28 -2.7 14.56 perf-stat.overall.cache-miss-rate% 1.73 +3.4% 1.79 perf-stat.overall.cpi 0.25 ± 4% -0.0 0.22 ± 3% perf-stat.overall.dTLB-store-miss-rate% 69.20 ± 4% +6.4 75.60 perf-stat.overall.iTLB-load-miss-rate% 0.58 -3.2% 0.56 perf-stat.overall.ipc 45.25 -2.2 43.10 perf-stat.overall.node-load-miss-rate% 7.199e+08 -4.4% 6.883e+08 perf-stat.ps.branch-instructions 19919808 +2.8% 20469001 perf-stat.ps.branch-misses 40597326 +19.5% 48497201 perf-stat.ps.cache-references 55.06 +5.4% 58.03 ± 2% perf-stat.ps.cpu-migrations 1.007e+09 -3.8% 9.686e+08 perf-stat.ps.dTLB-loads 2158768 ± 4% -6.9% 2009174 ± 3% perf-stat.ps.dTLB-store-misses 8.475e+08 +5.5% 8.937e+08 perf-stat.ps.dTLB-stores 315394 ± 12% -31.6% 215816 ± 3% perf-stat.ps.iTLB-loads 3.71e+09 -3.1% 3.595e+09 perf-stat.ps.instructions 106351 ± 3% +14.8% 122083 ± 5% perf-stat.ps.node-loads 106728 ± 4% +29.1% 137740 ± 7% perf-stat.ps.node-stores 1.117e+12 -3.0% 1.084e+12 perf-stat.total.instructions 0.00 +0.8 0.75 ± 12% perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.__xa_erase.xa_erase.simple_offset_remove 0.00 +0.8 0.78 ± 34% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store 0.00 +0.8 0.83 ± 29% perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand 0.00 +0.9 0.92 ± 26% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create 0.00 +1.0 0.99 ± 27% perf-profile.calltrace.cycles-pp.shuffle_freelist.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc 0.00 +1.0 1.04 ± 28% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store.__xa_alloc 0.00 +1.1 1.11 ± 26% perf-profile.calltrace.cycles-pp.xas_alloc.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 1.51 ± 24% +1.2 2.73 ± 10% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +1.2 1.24 ± 20% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create.xas_store 0.00 +1.3 1.27 ± 10% perf-profile.calltrace.cycles-pp.xas_store.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink 0.00 +1.3 1.30 ± 10% perf-profile.calltrace.cycles-pp.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink 0.00 +1.3 1.33 ± 19% perf-profile.calltrace.cycles-pp.xas_alloc.xas_expand.xas_create.xas_store.__xa_alloc 0.00 +1.4 1.36 ± 10% perf-profile.calltrace.cycles-pp.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat 0.00 +1.4 1.37 ± 10% perf-profile.calltrace.cycles-pp.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink 0.00 +1.5 1.51 ± 17% perf-profile.calltrace.cycles-pp.xas_expand.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 0.00 +1.6 1.62 ± 12% perf-profile.calltrace.cycles-pp.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64 0.00 +2.8 2.80 ± 13% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add 0.00 +2.9 2.94 ± 13% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod 5.38 ± 24% +3.1 8.51 ± 11% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 6.08 ± 24% +3.2 9.24 ± 12% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat 0.00 +3.2 3.20 ± 13% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open 0.00 +3.2 3.24 ± 13% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +3.4 3.36 ± 14% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat 2.78 ± 25% +3.4 6.17 ± 12% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open 0.16 ± 30% -0.1 0.08 ± 20% perf-profile.children.cycles-pp.map_id_up 0.02 ±146% +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_is_huge 0.02 ±141% +0.1 0.09 ± 16% perf-profile.children.cycles-pp.__list_del_entry_valid 0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.free_unref_page 0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_destroy_inode 0.04 ±101% +0.1 0.14 ± 25% perf-profile.children.cycles-pp.rcu_nocb_try_bypass 0.00 +0.1 0.12 ± 27% perf-profile.children.cycles-pp.xas_find_marked 0.02 ±144% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.__unfreeze_partials 0.03 ±106% +0.2 0.19 ± 26% perf-profile.children.cycles-pp.xas_descend 0.01 ±223% +0.2 0.17 ± 15% perf-profile.children.cycles-pp.get_page_from_freelist 0.11 ± 22% +0.2 0.29 ± 16% perf-profile.children.cycles-pp.rcu_segcblist_enqueue 0.02 ±146% +0.2 0.24 ± 13% perf-profile.children.cycles-pp.__alloc_pages 0.36 ± 79% +0.6 0.98 ± 15% perf-profile.children.cycles-pp.__slab_free 0.50 ± 26% +0.7 1.23 ± 14% perf-profile.children.cycles-pp.__call_rcu_common 0.00 +0.8 0.82 ± 13% perf-profile.children.cycles-pp.radix_tree_node_rcu_free 0.00 +1.1 1.14 ± 17% perf-profile.children.cycles-pp.radix_tree_node_ctor 0.16 ± 86% +1.2 1.38 ± 16% perf-profile.children.cycles-pp.setup_object 1.52 ± 25% +1.2 2.75 ± 10% perf-profile.children.cycles-pp.vfs_unlink 0.36 ± 22% +1.3 1.63 ± 12% perf-profile.children.cycles-pp.shmem_unlink 0.00 +1.3 1.30 ± 10% perf-profile.children.cycles-pp.__xa_erase 0.20 ± 79% +1.3 1.53 ± 15% perf-profile.children.cycles-pp.shuffle_freelist 0.00 +1.4 1.36 ± 10% perf-profile.children.cycles-pp.xa_erase 0.00 +1.4 1.38 ± 10% perf-profile.children.cycles-pp.simple_offset_remove 0.00 +1.5 1.51 ± 17% perf-profile.children.cycles-pp.xas_expand 0.26 ± 78% +1.6 1.87 ± 13% perf-profile.children.cycles-pp.allocate_slab 0.40 ± 49% +1.7 2.10 ± 13% perf-profile.children.cycles-pp.___slab_alloc 1.30 ± 85% +2.1 3.42 ± 12% perf-profile.children.cycles-pp.rcu_do_batch 1.56 ± 27% +2.4 3.93 ± 11% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 0.00 +2.4 2.44 ± 12% perf-profile.children.cycles-pp.xas_alloc 2.66 ± 13% +2.5 5.14 ± 5% perf-profile.children.cycles-pp.__irq_exit_rcu 11.16 ± 10% +2.7 13.88 ± 8% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 11.77 ± 10% +2.7 14.49 ± 8% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.00 +2.8 2.82 ± 13% perf-profile.children.cycles-pp.xas_create 5.40 ± 24% +3.1 8.52 ± 11% perf-profile.children.cycles-pp.lookup_open 6.12 ± 24% +3.1 9.27 ± 12% perf-profile.children.cycles-pp.open_last_lookups 0.00 +3.2 3.22 ± 13% perf-profile.children.cycles-pp.__xa_alloc 0.00 +3.2 3.24 ± 13% perf-profile.children.cycles-pp.__xa_alloc_cyclic 0.00 +3.4 3.36 ± 14% perf-profile.children.cycles-pp.simple_offset_add 2.78 ± 25% +3.4 6.18 ± 12% perf-profile.children.cycles-pp.shmem_mknod 0.00 +4.2 4.24 ± 12% perf-profile.children.cycles-pp.xas_store 0.14 ± 27% -0.1 0.08 ± 21% perf-profile.self.cycles-pp.map_id_up 0.00 +0.1 0.06 ± 24% perf-profile.self.cycles-pp.shmem_destroy_inode 0.00 +0.1 0.07 ± 8% perf-profile.self.cycles-pp.__xa_alloc 0.02 ±146% +0.1 0.11 ± 28% perf-profile.self.cycles-pp.rcu_nocb_try_bypass 0.01 ±223% +0.1 0.10 ± 28% perf-profile.self.cycles-pp.shuffle_freelist 0.00 +0.1 0.11 ± 40% perf-profile.self.cycles-pp.xas_create 0.00 +0.1 0.12 ± 27% perf-profile.self.cycles-pp.xas_find_marked 0.00 +0.1 0.14 ± 18% perf-profile.self.cycles-pp.xas_alloc 0.03 ±103% +0.1 0.17 ± 29% perf-profile.self.cycles-pp.xas_descend 0.00 +0.2 0.16 ± 23% perf-profile.self.cycles-pp.xas_expand 0.10 ± 22% +0.2 0.27 ± 16% perf-profile.self.cycles-pp.rcu_segcblist_enqueue 0.00 +0.4 0.36 ± 16% perf-profile.self.cycles-pp.xas_store 0.32 ± 30% +0.4 0.71 ± 12% perf-profile.self.cycles-pp.__call_rcu_common 0.18 ± 27% +0.5 0.65 ± 8% perf-profile.self.cycles-pp.kmem_cache_alloc_lru 0.36 ± 79% +0.6 0.96 ± 15% perf-profile.self.cycles-pp.__slab_free 0.00 +0.8 0.80 ± 14% perf-profile.self.cycles-pp.radix_tree_node_rcu_free 0.00 +1.0 1.01 ± 16% perf-profile.self.cycles-pp.radix_tree_node_ctor *************************************************************************************************** lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/all/aim9/5s commit: 23a31d8764 ("shmem: Refactor shmem_symlink()") a2e459555c ("shmem: stable directory offsets") 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 ---------------- --------------------------- %stddev %change %stddev \ | \ 9781285 +2.0% 9975309 proc-vmstat.pgalloc_normal 4481052 -1.6% 4408359 proc-vmstat.pgfault 9749965 +2.0% 9942285 proc-vmstat.pgfree 14556 -1.6% 14324 perf-stat.i.minor-faults 14556 -1.6% 14324 perf-stat.i.page-faults 14505 -1.6% 14272 perf-stat.ps.minor-faults 14505 -1.6% 14272 perf-stat.ps.page-faults 849714 -3.6% 819341 aim9.brk_test.ops_per_sec 478138 +3.1% 492806 aim9.dgram_pipe.ops_per_sec 199087 -14.6% 170071 aim9.disk_src.ops_per_sec 286595 -9.7% 258794 aim9.link_test.ops_per_sec 303603 -2.8% 295009 aim9.page_test.ops_per_sec 3692190 -1.7% 3629732 aim9.time.minor_page_faults 0.00 +1.0 0.95 ± 25% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add 0.00 +1.0 1.01 ± 23% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod 1.54 ± 22% +1.1 2.61 ± 22% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open 0.00 +1.2 1.15 ± 21% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open 0.00 +1.2 1.18 ± 21% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +1.2 1.22 ± 21% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat 0.28 ± 21% +0.2 0.45 ± 24% perf-profile.children.cycles-pp.__call_rcu_common 0.00 +0.3 0.26 ± 43% perf-profile.children.cycles-pp.radix_tree_node_rcu_free 0.14 ± 46% +0.3 0.45 ± 20% perf-profile.children.cycles-pp.setup_object 0.00 +0.3 0.33 ± 24% perf-profile.children.cycles-pp.radix_tree_node_ctor 0.16 ± 49% +0.4 0.52 ± 24% perf-profile.children.cycles-pp.shuffle_freelist 0.23 ± 43% +0.4 0.63 ± 23% perf-profile.children.cycles-pp.allocate_slab 0.30 ± 35% +0.4 0.74 ± 24% perf-profile.children.cycles-pp.___slab_alloc 0.17 ± 25% +0.5 0.66 ± 23% perf-profile.children.cycles-pp.shmem_unlink 0.00 +0.5 0.49 ± 24% perf-profile.children.cycles-pp.__xa_erase 0.00 +0.5 0.52 ± 24% perf-profile.children.cycles-pp.xa_erase 0.00 +0.5 0.52 ± 64% perf-profile.children.cycles-pp.xas_expand 0.00 +0.5 0.53 ± 24% perf-profile.children.cycles-pp.simple_offset_remove 0.87 ± 26% +0.7 1.56 ± 23% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 2.44 ± 12% +0.8 3.25 ± 13% perf-profile.children.cycles-pp.__irq_exit_rcu 0.00 +0.8 0.82 ± 24% perf-profile.children.cycles-pp.xas_alloc 0.01 ±230% +1.0 0.99 ± 23% perf-profile.children.cycles-pp.xas_create 1.55 ± 22% +1.1 2.63 ± 22% perf-profile.children.cycles-pp.shmem_mknod 0.00 +1.2 1.16 ± 21% perf-profile.children.cycles-pp.__xa_alloc 0.00 +1.2 1.18 ± 21% perf-profile.children.cycles-pp.__xa_alloc_cyclic 0.00 +1.2 1.22 ± 21% perf-profile.children.cycles-pp.simple_offset_add 0.18 ± 28% +1.5 1.65 ± 21% perf-profile.children.cycles-pp.xas_store 0.11 ± 31% +0.1 0.25 ± 27% perf-profile.self.cycles-pp.xas_store 0.11 ± 31% +0.2 0.28 ± 24% perf-profile.self.cycles-pp.kmem_cache_alloc_lru 0.00 +0.3 0.26 ± 44% perf-profile.self.cycles-pp.radix_tree_node_rcu_free 0.00 +0.3 0.29 ± 23% perf-profile.self.cycles-pp.radix_tree_node_ctor Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki