Hello, kernel test robot noticed a 11.0% improvement of stress-ng.sockfd.ops_per_sec on: commit: 996f4dcbd231ec022f38a3c27e7fc45727e4e875 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory parameters: nr_threads: 100% testtime: 60s test: sockfd cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240527/202405271558.f424aa27-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockfd/stress-ng/60s commit: d637168810 ("crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs") 996f4dcbd2 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation") d6371688101223a3 996f4dcbd231ec022f38a3c27e7 ---------------- --------------------------- %stddev %change %stddev \ | \ 730290 ± 11% +26.2% 921636 ± 13% meminfo.Mapped 24673 ± 2% +8.1% 26682 ± 3% perf-c2c.HITM.total 61893 +3.6% 64151 vmstat.system.cs 0.28 ± 8% -11.6% 0.25 ± 9% sched_debug.cfs_rq:/.h_nr_running.stddev 196.71 ± 6% -11.8% 173.53 ± 11% sched_debug.cfs_rq:/.util_est.stddev 46304617 +11.0% 51404735 stress-ng.sockfd.ops 771591 +11.0% 856468 stress-ng.sockfd.ops_per_sec 2336146 -3.1% 2263883 stress-ng.time.involuntary_context_switches 1365039 ± 2% +15.8% 1580362 stress-ng.time.voluntary_context_switches 183309 ± 11% +26.1% 231069 ± 13% proc-vmstat.nr_mapped 1843540 +2.4% 1888288 proc-vmstat.numa_hit 1611479 +2.8% 1656095 proc-vmstat.numa_local 2952001 ± 3% +5.1% 3103307 proc-vmstat.pgalloc_normal 2282989 ± 4% +7.5% 2454018 ± 2% proc-vmstat.pgfree 0.42 ± 2% +6.2% 0.44 perf-stat.i.MPKI 1.487e+10 +1.8% 1.513e+10 perf-stat.i.branch-instructions 25452853 +9.2% 27794083 perf-stat.i.cache-misses 85628078 +8.2% 92619680 perf-stat.i.cache-references 63603 ± 2% +4.2% 66264 perf-stat.i.context-switches 10.03 -1.7% 9.86 perf-stat.i.cpi 26278 -9.1% 23887 perf-stat.i.cycles-between-cache-misses 6.35e+10 +2.2% 6.488e+10 perf-stat.i.instructions 0.10 +1.5% 0.10 perf-stat.i.ipc 0.06 ± 46% +140.8% 0.14 ± 40% perf-stat.i.major-faults 0.40 +7.8% 0.43 perf-stat.overall.MPKI 10.18 -2.0% 9.97 perf-stat.overall.cpi 25755 -9.1% 23420 perf-stat.overall.cycles-between-cache-misses 0.10 +2.0% 0.10 perf-stat.overall.ipc 1.423e+10 +1.3% 1.442e+10 perf-stat.ps.branch-instructions 49502972 +8.1e+05% 3.998e+11 ±223% perf-stat.ps.branch-misses 23994964 +9.7% 26326983 perf-stat.ps.cache-misses 82930036 +8.2% 89756764 perf-stat.ps.cache-references 61357 +3.3% 63381 perf-stat.ps.context-switches 6.072e+10 +1.8% 6.18e+10 perf-stat.ps.instructions 0.04 ± 46% +137.6% 0.11 ± 37% perf-stat.ps.major-faults 3.653e+12 +1.6% 3.712e+12 perf-stat.total.instructions 48.81 -0.2 48.59 perf-profile.calltrace.cycles-pp.unix_inflight.unix_scm_to_skb.unix_stream_sendmsg.____sys_sendmsg.___sys_sendmsg 48.45 -0.2 48.29 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg 48.57 -0.2 48.42 perf-profile.calltrace.cycles-pp.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg 48.53 -0.2 48.38 perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 49.02 -0.1 48.88 perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg 49.01 -0.1 48.87 perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg 49.02 -0.1 48.88 perf-profile.calltrace.cycles-pp.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64 49.04 -0.1 48.90 perf-profile.calltrace.cycles-pp.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.11 -0.1 48.97 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd 49.14 -0.1 49.00 perf-profile.calltrace.cycles-pp.recvmsg.stress_sockfd 49.08 -0.1 48.95 perf-profile.calltrace.cycles-pp.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd 49.07 -0.1 48.94 perf-profile.calltrace.cycles-pp.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg 49.11 -0.1 48.98 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd 0.56 ± 3% +0.1 0.65 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64 0.56 ± 3% +0.1 0.65 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 0.58 ± 2% +0.1 0.67 ± 3% perf-profile.calltrace.cycles-pp.open64 0.55 ± 3% +0.1 0.64 ± 3% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 0.56 ± 3% +0.1 0.65 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 0.17 ±141% +0.4 0.57 ± 3% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64 0.17 ±141% +0.4 0.58 ± 4% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 97.19 -0.4 96.83 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 97.49 -0.4 97.13 perf-profile.children.cycles-pp._raw_spin_lock 48.81 -0.2 48.59 perf-profile.children.cycles-pp.unix_inflight 48.57 -0.2 48.42 perf-profile.children.cycles-pp.unix_notinflight 49.02 -0.1 48.88 perf-profile.children.cycles-pp.unix_stream_read_generic 49.02 -0.1 48.88 perf-profile.children.cycles-pp.unix_stream_recvmsg 49.03 -0.1 48.89 perf-profile.children.cycles-pp.sock_recvmsg 49.04 -0.1 48.90 perf-profile.children.cycles-pp.____sys_recvmsg 49.09 -0.1 48.95 perf-profile.children.cycles-pp.__sys_recvmsg 49.07 -0.1 48.94 perf-profile.children.cycles-pp.___sys_recvmsg 49.15 -0.1 49.02 perf-profile.children.cycles-pp.recvmsg 0.09 +0.0 0.10 perf-profile.children.cycles-pp.__memcg_slab_free_hook 0.06 +0.0 0.07 perf-profile.children.cycles-pp.sock_alloc_send_pskb 0.12 ± 3% +0.0 0.14 ± 2% perf-profile.children.cycles-pp.alloc_empty_file 0.07 ± 8% +0.0 0.09 ± 11% perf-profile.children.cycles-pp.dput 0.07 ± 11% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.lockref_put_return 0.12 ± 3% +0.0 0.15 ± 7% perf-profile.children.cycles-pp.__fput 0.22 ± 2% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.17 ± 2% +0.0 0.20 ± 6% perf-profile.children.cycles-pp.task_work_run 0.18 ± 3% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.syscall 0.17 ± 6% +0.0 0.21 ± 7% perf-profile.children.cycles-pp.do_dentry_open 0.02 ±141% +0.0 0.06 ± 28% perf-profile.children.cycles-pp.generic_perform_write 0.09 ± 5% +0.0 0.14 ± 37% perf-profile.children.cycles-pp.cmd_record 0.09 ± 5% +0.0 0.14 ± 37% perf-profile.children.cycles-pp.record__mmap_read_evlist 0.26 ± 5% +0.0 0.31 ± 6% perf-profile.children.cycles-pp.do_open 0.08 ± 8% +0.0 0.13 ± 34% perf-profile.children.cycles-pp.perf_mmap__push 0.09 ± 5% +0.1 0.14 ± 35% perf-profile.children.cycles-pp.main 0.09 ± 5% +0.1 0.14 ± 35% perf-profile.children.cycles-pp.run_builtin 0.50 ± 3% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.do_filp_open 0.49 ± 3% +0.1 0.57 ± 4% perf-profile.children.cycles-pp.path_openat 0.56 ± 3% +0.1 0.64 ± 3% perf-profile.children.cycles-pp.do_sys_openat2 0.56 ± 3% +0.1 0.65 ± 3% perf-profile.children.cycles-pp.__x64_sys_openat 0.59 ± 2% +0.1 0.69 ± 3% perf-profile.children.cycles-pp.open64 96.75 -0.3 96.40 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.07 ± 11% +0.0 0.09 ± 12% perf-profile.self.cycles-pp.lockref_put_return Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki