> On Sep 8, 2023, at 1:26 AM, kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > > > Hello, > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > testcase: aim9 > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory > parameters: > > testtime: 300s > test: disk_src > cpufreq_governor: performance > > > In addition to that, the commit also has significant impact on the following tests: > > +------------------+-------------------------------------------------------------------------------------------------+ > | testcase: change | aim9: aim9.disk_src.ops_per_sec -14.6% regression | > | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory | > | test parameters | cpufreq_governor=performance | > | | test=all | > | | testtime=5s | > +------------------+-------------------------------------------------------------------------------------------------+ > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@xxxxxxxxx Hi, several weeks ago we requested that these tests be run again by the robot because they can't be run in environments I have available to me (the tests do not run on Fedora, and I don't have any big iron). We wanted the tests rerun before the patch was committed. There was a deafening silence. So I assumed the work I did then to address the regression was successful, and the patches are now in upstream Linux. This new report is disappointing. But, I'm still in a position where I can't run this test, and the results don't really indicate where the problem is. So I can't possibly address this issue. Any suggestions, advice, or help would be appreciated. > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20230908/202309081306.3ecb3734-oliver.sang@xxxxxxxxx > > ========================================================================================= > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: > gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s > > commit: > 23a31d8764 ("shmem: Refactor shmem_symlink()") > a2e459555c ("shmem: stable directory offsets") > > 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 0.26 ± 9% +0.1 0.36 ± 2% mpstat.cpu.all.soft% > 0.61 -0.1 0.52 mpstat.cpu.all.usr% > 0.16 ± 10% -18.9% 0.13 ± 12% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 > 0.04 ± 7% +1802.4% 0.78 ±115% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 202424 -19.0% 163868 aim9.disk_src.ops_per_sec > 94.83 -4.2% 90.83 aim9.time.percent_of_cpu_this_job_got > 73.62 -17.6% 60.69 aim9.time.user_time > 23541 +6.5% 25074 proc-vmstat.nr_slab_reclaimable > 1437319 ± 24% +377.6% 6864201 proc-vmstat.numa_hit > 1387016 ± 25% +391.4% 6815486 proc-vmstat.numa_local > 4864362 ± 34% +453.6% 26931180 proc-vmstat.pgalloc_normal > 4835960 ± 34% +455.4% 26856610 proc-vmstat.pgfree > 538959 ± 24% -23.2% 414090 sched_debug.cfs_rq:/.load.max > 130191 ± 14% -13.3% 112846 ± 6% sched_debug.cfs_rq:/.load.stddev > 116849 ± 27% -51.2% 56995 ± 20% sched_debug.cfs_rq:/.min_vruntime.max > 1223 ±191% -897.4% -9754 sched_debug.cfs_rq:/.spread0.avg > 107969 ± 29% -65.3% 37448 ± 39% sched_debug.cfs_rq:/.spread0.max > 55209 ± 14% -21.8% 43154 ± 14% sched_debug.cpu.nr_switches.max > 11.21 +23.7% 13.87 perf-stat.i.MPKI > 7.223e+08 -4.4% 6.907e+08 perf-stat.i.branch-instructions > 2.67 +0.2 2.88 perf-stat.i.branch-miss-rate% > 19988363 +2.8% 20539702 perf-stat.i.branch-misses > 17.36 -2.8 14.59 perf-stat.i.cache-miss-rate% > 40733859 +19.5% 48659982 perf-stat.i.cache-references > 1.76 +3.5% 1.82 perf-stat.i.cpi > 55.21 +5.4% 58.21 ± 2% perf-stat.i.cpu-migrations > 1.01e+09 -3.8% 9.719e+08 perf-stat.i.dTLB-loads > 0.26 ± 4% -0.0 0.23 ± 3% perf-stat.i.dTLB-store-miss-rate% > 2166022 ± 4% -6.9% 2015917 ± 3% perf-stat.i.dTLB-store-misses > 8.503e+08 +5.5% 8.968e+08 perf-stat.i.dTLB-stores > 69.22 ± 4% +6.4 75.60 perf-stat.i.iTLB-load-miss-rate% > 316455 ± 12% -31.6% 216531 ± 3% perf-stat.i.iTLB-loads > 3.722e+09 -3.1% 3.608e+09 perf-stat.i.instructions > 0.57 -3.3% 0.55 perf-stat.i.ipc > 865.04 -10.4% 775.02 ± 3% perf-stat.i.metric.K/sec > 47.51 -2.1 45.37 perf-stat.i.node-load-miss-rate% > 106705 ± 3% +14.8% 122490 ± 5% perf-stat.i.node-loads > 107169 ± 4% +29.0% 138208 ± 7% perf-stat.i.node-stores > 10.94 +23.3% 13.49 perf-stat.overall.MPKI > 2.77 +0.2 2.97 perf-stat.overall.branch-miss-rate% > 17.28 -2.7 14.56 perf-stat.overall.cache-miss-rate% > 1.73 +3.4% 1.79 perf-stat.overall.cpi > 0.25 ± 4% -0.0 0.22 ± 3% perf-stat.overall.dTLB-store-miss-rate% > 69.20 ± 4% +6.4 75.60 perf-stat.overall.iTLB-load-miss-rate% > 0.58 -3.2% 0.56 perf-stat.overall.ipc > 45.25 -2.2 43.10 perf-stat.overall.node-load-miss-rate% > 7.199e+08 -4.4% 6.883e+08 perf-stat.ps.branch-instructions > 19919808 +2.8% 20469001 perf-stat.ps.branch-misses > 40597326 +19.5% 48497201 perf-stat.ps.cache-references > 55.06 +5.4% 58.03 ± 2% perf-stat.ps.cpu-migrations > 1.007e+09 -3.8% 9.686e+08 perf-stat.ps.dTLB-loads > 2158768 ± 4% -6.9% 2009174 ± 3% perf-stat.ps.dTLB-store-misses > 8.475e+08 +5.5% 8.937e+08 perf-stat.ps.dTLB-stores > 315394 ± 12% -31.6% 215816 ± 3% perf-stat.ps.iTLB-loads > 3.71e+09 -3.1% 3.595e+09 perf-stat.ps.instructions > 106351 ± 3% +14.8% 122083 ± 5% perf-stat.ps.node-loads > 106728 ± 4% +29.1% 137740 ± 7% perf-stat.ps.node-stores > 1.117e+12 -3.0% 1.084e+12 perf-stat.total.instructions > 0.00 +0.8 0.75 ± 12% perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.__xa_erase.xa_erase.simple_offset_remove > 0.00 +0.8 0.78 ± 34% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store > 0.00 +0.8 0.83 ± 29% perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand > 0.00 +0.9 0.92 ± 26% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create > 0.00 +1.0 0.99 ± 27% perf-profile.calltrace.cycles-pp.shuffle_freelist.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc > 0.00 +1.0 1.04 ± 28% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store.__xa_alloc > 0.00 +1.1 1.11 ± 26% perf-profile.calltrace.cycles-pp.xas_alloc.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic > 1.51 ± 24% +1.2 2.73 ± 10% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.00 +1.2 1.24 ± 20% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create.xas_store > 0.00 +1.3 1.27 ± 10% perf-profile.calltrace.cycles-pp.xas_store.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink > 0.00 +1.3 1.30 ± 10% perf-profile.calltrace.cycles-pp.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink > 0.00 +1.3 1.33 ± 19% perf-profile.calltrace.cycles-pp.xas_alloc.xas_expand.xas_create.xas_store.__xa_alloc > 0.00 +1.4 1.36 ± 10% perf-profile.calltrace.cycles-pp.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat > 0.00 +1.4 1.37 ± 10% perf-profile.calltrace.cycles-pp.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink > 0.00 +1.5 1.51 ± 17% perf-profile.calltrace.cycles-pp.xas_expand.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic > 0.00 +1.6 1.62 ± 12% perf-profile.calltrace.cycles-pp.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64 > 0.00 +2.8 2.80 ± 13% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add > 0.00 +2.9 2.94 ± 13% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod > 5.38 ± 24% +3.1 8.51 ± 11% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 > 6.08 ± 24% +3.2 9.24 ± 12% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat > 0.00 +3.2 3.20 ± 13% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open > 0.00 +3.2 3.24 ± 13% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups > 0.00 +3.4 3.36 ± 14% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat > 2.78 ± 25% +3.4 6.17 ± 12% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open > 0.16 ± 30% -0.1 0.08 ± 20% perf-profile.children.cycles-pp.map_id_up > 0.02 ±146% +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_is_huge > 0.02 ±141% +0.1 0.09 ± 16% perf-profile.children.cycles-pp.__list_del_entry_valid > 0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.free_unref_page > 0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_destroy_inode > 0.04 ±101% +0.1 0.14 ± 25% perf-profile.children.cycles-pp.rcu_nocb_try_bypass > 0.00 +0.1 0.12 ± 27% perf-profile.children.cycles-pp.xas_find_marked > 0.02 ±144% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.__unfreeze_partials > 0.03 ±106% +0.2 0.19 ± 26% perf-profile.children.cycles-pp.xas_descend > 0.01 ±223% +0.2 0.17 ± 15% perf-profile.children.cycles-pp.get_page_from_freelist > 0.11 ± 22% +0.2 0.29 ± 16% perf-profile.children.cycles-pp.rcu_segcblist_enqueue > 0.02 ±146% +0.2 0.24 ± 13% perf-profile.children.cycles-pp.__alloc_pages > 0.36 ± 79% +0.6 0.98 ± 15% perf-profile.children.cycles-pp.__slab_free > 0.50 ± 26% +0.7 1.23 ± 14% perf-profile.children.cycles-pp.__call_rcu_common > 0.00 +0.8 0.82 ± 13% perf-profile.children.cycles-pp.radix_tree_node_rcu_free > 0.00 +1.1 1.14 ± 17% perf-profile.children.cycles-pp.radix_tree_node_ctor > 0.16 ± 86% +1.2 1.38 ± 16% perf-profile.children.cycles-pp.setup_object > 1.52 ± 25% +1.2 2.75 ± 10% perf-profile.children.cycles-pp.vfs_unlink > 0.36 ± 22% +1.3 1.63 ± 12% perf-profile.children.cycles-pp.shmem_unlink > 0.00 +1.3 1.30 ± 10% perf-profile.children.cycles-pp.__xa_erase > 0.20 ± 79% +1.3 1.53 ± 15% perf-profile.children.cycles-pp.shuffle_freelist > 0.00 +1.4 1.36 ± 10% perf-profile.children.cycles-pp.xa_erase > 0.00 +1.4 1.38 ± 10% perf-profile.children.cycles-pp.simple_offset_remove > 0.00 +1.5 1.51 ± 17% perf-profile.children.cycles-pp.xas_expand > 0.26 ± 78% +1.6 1.87 ± 13% perf-profile.children.cycles-pp.allocate_slab > 0.40 ± 49% +1.7 2.10 ± 13% perf-profile.children.cycles-pp.___slab_alloc > 1.30 ± 85% +2.1 3.42 ± 12% perf-profile.children.cycles-pp.rcu_do_batch > 1.56 ± 27% +2.4 3.93 ± 11% perf-profile.children.cycles-pp.kmem_cache_alloc_lru > 0.00 +2.4 2.44 ± 12% perf-profile.children.cycles-pp.xas_alloc > 2.66 ± 13% +2.5 5.14 ± 5% perf-profile.children.cycles-pp.__irq_exit_rcu > 11.16 ± 10% +2.7 13.88 ± 8% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt > 11.77 ± 10% +2.7 14.49 ± 8% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > 0.00 +2.8 2.82 ± 13% perf-profile.children.cycles-pp.xas_create > 5.40 ± 24% +3.1 8.52 ± 11% perf-profile.children.cycles-pp.lookup_open > 6.12 ± 24% +3.1 9.27 ± 12% perf-profile.children.cycles-pp.open_last_lookups > 0.00 +3.2 3.22 ± 13% perf-profile.children.cycles-pp.__xa_alloc > 0.00 +3.2 3.24 ± 13% perf-profile.children.cycles-pp.__xa_alloc_cyclic > 0.00 +3.4 3.36 ± 14% perf-profile.children.cycles-pp.simple_offset_add > 2.78 ± 25% +3.4 6.18 ± 12% perf-profile.children.cycles-pp.shmem_mknod > 0.00 +4.2 4.24 ± 12% perf-profile.children.cycles-pp.xas_store > 0.14 ± 27% -0.1 0.08 ± 21% perf-profile.self.cycles-pp.map_id_up > 0.00 +0.1 0.06 ± 24% perf-profile.self.cycles-pp.shmem_destroy_inode > 0.00 +0.1 0.07 ± 8% perf-profile.self.cycles-pp.__xa_alloc > 0.02 ±146% +0.1 0.11 ± 28% perf-profile.self.cycles-pp.rcu_nocb_try_bypass > 0.01 ±223% +0.1 0.10 ± 28% perf-profile.self.cycles-pp.shuffle_freelist > 0.00 +0.1 0.11 ± 40% perf-profile.self.cycles-pp.xas_create > 0.00 +0.1 0.12 ± 27% perf-profile.self.cycles-pp.xas_find_marked > 0.00 +0.1 0.14 ± 18% perf-profile.self.cycles-pp.xas_alloc > 0.03 ±103% +0.1 0.17 ± 29% perf-profile.self.cycles-pp.xas_descend > 0.00 +0.2 0.16 ± 23% perf-profile.self.cycles-pp.xas_expand > 0.10 ± 22% +0.2 0.27 ± 16% perf-profile.self.cycles-pp.rcu_segcblist_enqueue > 0.00 +0.4 0.36 ± 16% perf-profile.self.cycles-pp.xas_store > 0.32 ± 30% +0.4 0.71 ± 12% perf-profile.self.cycles-pp.__call_rcu_common > 0.18 ± 27% +0.5 0.65 ± 8% perf-profile.self.cycles-pp.kmem_cache_alloc_lru > 0.36 ± 79% +0.6 0.96 ± 15% perf-profile.self.cycles-pp.__slab_free > 0.00 +0.8 0.80 ± 14% perf-profile.self.cycles-pp.radix_tree_node_rcu_free > 0.00 +1.0 1.01 ± 16% perf-profile.self.cycles-pp.radix_tree_node_ctor > > > *************************************************************************************************** > lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory > ========================================================================================= > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: > gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/all/aim9/5s > > commit: > 23a31d8764 ("shmem: Refactor shmem_symlink()") > a2e459555c ("shmem: stable directory offsets") > > 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 9781285 +2.0% 9975309 proc-vmstat.pgalloc_normal > 4481052 -1.6% 4408359 proc-vmstat.pgfault > 9749965 +2.0% 9942285 proc-vmstat.pgfree > 14556 -1.6% 14324 perf-stat.i.minor-faults > 14556 -1.6% 14324 perf-stat.i.page-faults > 14505 -1.6% 14272 perf-stat.ps.minor-faults > 14505 -1.6% 14272 perf-stat.ps.page-faults > 849714 -3.6% 819341 aim9.brk_test.ops_per_sec > 478138 +3.1% 492806 aim9.dgram_pipe.ops_per_sec > 199087 -14.6% 170071 aim9.disk_src.ops_per_sec > 286595 -9.7% 258794 aim9.link_test.ops_per_sec > 303603 -2.8% 295009 aim9.page_test.ops_per_sec > 3692190 -1.7% 3629732 aim9.time.minor_page_faults > 0.00 +1.0 0.95 ± 25% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add > 0.00 +1.0 1.01 ± 23% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod > 1.54 ± 22% +1.1 2.61 ± 22% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open > 0.00 +1.2 1.15 ± 21% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open > 0.00 +1.2 1.18 ± 21% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups > 0.00 +1.2 1.22 ± 21% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat > 0.28 ± 21% +0.2 0.45 ± 24% perf-profile.children.cycles-pp.__call_rcu_common > 0.00 +0.3 0.26 ± 43% perf-profile.children.cycles-pp.radix_tree_node_rcu_free > 0.14 ± 46% +0.3 0.45 ± 20% perf-profile.children.cycles-pp.setup_object > 0.00 +0.3 0.33 ± 24% perf-profile.children.cycles-pp.radix_tree_node_ctor > 0.16 ± 49% +0.4 0.52 ± 24% perf-profile.children.cycles-pp.shuffle_freelist > 0.23 ± 43% +0.4 0.63 ± 23% perf-profile.children.cycles-pp.allocate_slab > 0.30 ± 35% +0.4 0.74 ± 24% perf-profile.children.cycles-pp.___slab_alloc > 0.17 ± 25% +0.5 0.66 ± 23% perf-profile.children.cycles-pp.shmem_unlink > 0.00 +0.5 0.49 ± 24% perf-profile.children.cycles-pp.__xa_erase > 0.00 +0.5 0.52 ± 24% perf-profile.children.cycles-pp.xa_erase > 0.00 +0.5 0.52 ± 64% perf-profile.children.cycles-pp.xas_expand > 0.00 +0.5 0.53 ± 24% perf-profile.children.cycles-pp.simple_offset_remove > 0.87 ± 26% +0.7 1.56 ± 23% perf-profile.children.cycles-pp.kmem_cache_alloc_lru > 2.44 ± 12% +0.8 3.25 ± 13% perf-profile.children.cycles-pp.__irq_exit_rcu > 0.00 +0.8 0.82 ± 24% perf-profile.children.cycles-pp.xas_alloc > 0.01 ±230% +1.0 0.99 ± 23% perf-profile.children.cycles-pp.xas_create > 1.55 ± 22% +1.1 2.63 ± 22% perf-profile.children.cycles-pp.shmem_mknod > 0.00 +1.2 1.16 ± 21% perf-profile.children.cycles-pp.__xa_alloc > 0.00 +1.2 1.18 ± 21% perf-profile.children.cycles-pp.__xa_alloc_cyclic > 0.00 +1.2 1.22 ± 21% perf-profile.children.cycles-pp.simple_offset_add > 0.18 ± 28% +1.5 1.65 ± 21% perf-profile.children.cycles-pp.xas_store > 0.11 ± 31% +0.1 0.25 ± 27% perf-profile.self.cycles-pp.xas_store > 0.11 ± 31% +0.2 0.28 ± 24% perf-profile.self.cycles-pp.kmem_cache_alloc_lru > 0.00 +0.3 0.26 ± 44% perf-profile.self.cycles-pp.radix_tree_node_rcu_free > 0.00 +0.3 0.29 ± 23% perf-profile.self.cycles-pp.radix_tree_node_ctor > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > -- Chuck Lever