Hello, kernel test robot noticed a 5.7% improvement of stress-ng.sigtrap.ops_per_sec on: commit: e787060bdfa35f8b40ef4d277a345ee35b41039f ("crypto: x86/aes-xts - wire up VAES + AVX2 implementation") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s test: sigtrap cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240531/202405311430.e1f484a4-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sigtrap/stress-ng/60s commit: 996f4dcbd2 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation") e787060bdf ("crypto: x86/aes-xts - wire up VAES + AVX2 implementation") 996f4dcbd231ec02 e787060bdfa35f8b40ef4d277a3 ---------------- --------------------------- %stddev %change %stddev \ | \ 11005 ± 5% -18.0% 9022 ± 5% perf-c2c.DRAM.remote 4834 ± 7% -23.8% 3684 ± 5% perf-c2c.HITM.remote 6143 ± 48% +101.7% 12390 ± 27% proc-vmstat.numa_hint_faults 4427 ± 35% +56.7% 6939 ± 6% proc-vmstat.numa_hint_faults_local 301865 +2.3% 308839 proc-vmstat.pgfault 5597 -6.4% 5240 stress-ng.sigtrap.nanosecs_to_handle_SIGTRAP 6.075e+08 +5.7% 6.418e+08 stress-ng.sigtrap.ops 10124240 +5.7% 10696368 stress-ng.sigtrap.ops_per_sec 177.72 +6.9% 190.03 stress-ng.time.user_time 0.53 -17.3% 0.43 perf-stat.i.MPKI 7.911e+09 +5.1% 8.314e+09 perf-stat.i.branch-instructions 32.57 -5.5 27.10 perf-stat.i.cache-miss-rate% 22467086 -13.2% 19505617 perf-stat.i.cache-misses 69342308 +4.1% 72189549 perf-stat.i.cache-references 5.26 -5.0% 5.00 perf-stat.i.cpi 10083 +15.5% 11642 perf-stat.i.cycles-between-cache-misses 4.275e+10 +5.1% 4.495e+10 perf-stat.i.instructions 0.20 +5.1% 0.21 perf-stat.i.ipc 3976 +3.7% 4122 perf-stat.i.minor-faults 3976 +3.7% 4122 perf-stat.i.page-faults 0.53 -17.5% 0.43 perf-stat.overall.MPKI 0.78 ± 3% -0.0 0.73 perf-stat.overall.branch-miss-rate% 32.29 -5.3 26.95 perf-stat.overall.cache-miss-rate% 5.29 -4.9% 5.03 perf-stat.overall.cpi 10068 +15.2% 11596 perf-stat.overall.cycles-between-cache-misses 0.19 +5.2% 0.20 perf-stat.overall.ipc 7.772e+09 +5.1% 8.17e+09 perf-stat.ps.branch-instructions 22071930 -13.2% 19162218 perf-stat.ps.cache-misses 68355866 +4.0% 71101498 perf-stat.ps.cache-references 4.201e+10 +5.2% 4.418e+10 perf-stat.ps.instructions 3892 +3.7% 4035 perf-stat.ps.minor-faults 3892 +3.7% 4035 perf-stat.ps.page-faults 2.571e+12 +5.0% 2.698e+12 perf-stat.total.instructions 34.18 -0.6 33.55 perf-profile.calltrace.cycles-pp.asm_exc_int3.stress_sigtrap 15.07 -0.5 14.58 perf-profile.calltrace.cycles-pp.force_sig.exc_int3.asm_exc_int3.stress_sigtrap 15.47 -0.5 14.99 perf-profile.calltrace.cycles-pp.exc_int3.asm_exc_int3.stress_sigtrap 14.94 -0.5 14.46 perf-profile.calltrace.cycles-pp.force_sig_info_to_task.force_sig.exc_int3.asm_exc_int3.stress_sigtrap 37.71 -0.5 37.26 perf-profile.calltrace.cycles-pp.stress_sigtrap 14.04 -0.4 13.65 perf-profile.calltrace.cycles-pp.__send_signal_locked.force_sig_info_to_task.force_sig.exc_int3.asm_exc_int3 14.94 -0.4 14.58 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 12.48 -0.4 12.11 perf-profile.calltrace.cycles-pp.do_dec_rlimit_put_ucounts.collect_signal.dequeue_signal.get_signal.arch_do_signal_or_restart 15.19 -0.4 14.83 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap 12.54 -0.4 12.18 perf-profile.calltrace.cycles-pp.collect_signal.dequeue_signal.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode 13.34 -0.3 13.03 perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64 12.30 -0.3 12.00 perf-profile.calltrace.cycles-pp.do_dec_rlimit_put_ucounts.get_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3 0.73 -0.3 0.47 ± 33% perf-profile.calltrace.cycles-pp.complete_signal.__send_signal_locked.force_sig_info_to_task.force_sig.exc_int3 12.43 -0.2 12.18 perf-profile.calltrace.cycles-pp.inc_rlimit_get_ucounts.__sigqueue_alloc.__send_signal_locked.do_send_sig_info.do_send_specific 17.48 -0.2 17.25 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap 17.44 -0.2 17.21 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap 12.85 -0.2 12.63 perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal_locked.do_send_sig_info.do_send_specific.__x64_sys_tgkill 12.64 -0.2 12.44 perf-profile.calltrace.cycles-pp.inc_rlimit_get_ucounts.__sigqueue_alloc.__send_signal_locked.force_sig_info_to_task.force_sig 13.07 -0.2 12.88 perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal_locked.force_sig_info_to_task.force_sig.exc_int3 0.73 ± 2% -0.1 0.61 ± 5% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.fpu__clear_user_states.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode 2.50 -0.1 2.40 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap_handler 2.55 -0.1 2.45 perf-profile.calltrace.cycles-pp.asm_exc_int3.stress_sigtrap_handler 2.54 -0.1 2.44 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap_handler 3.07 -0.1 2.97 perf-profile.calltrace.cycles-pp.set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.37 ± 2% -0.1 1.28 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.16 -0.1 1.08 ± 3% perf-profile.calltrace.cycles-pp.fpu__clear_user_states.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3 1.24 -0.1 1.16 ± 2% perf-profile.calltrace.cycles-pp.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap_handler 0.79 -0.1 0.71 perf-profile.calltrace.cycles-pp.fpu__clear_user_states.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64 0.64 -0.1 0.57 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.force_sig_info_to_task.force_sig.exc_int3.asm_exc_int3 0.82 -0.1 0.75 perf-profile.calltrace.cycles-pp.get_task_cred.apparmor_task_kill.security_task_kill.do_send_specific.__x64_sys_tgkill 0.73 +0.0 0.77 perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64 0.78 +0.0 0.82 perf-profile.calltrace.cycles-pp.signal_setup_done.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.52 +0.0 0.57 perf-profile.calltrace.cycles-pp.recalc_sigpending.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.syscall_exit_to_user_mode 0.75 +0.1 0.80 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_int3.stress_sigtrap 1.34 +0.1 1.39 perf-profile.calltrace.cycles-pp.complete_signal.__send_signal_locked.do_send_sig_info.do_send_specific.__x64_sys_tgkill 3.06 +0.1 3.12 perf-profile.calltrace.cycles-pp.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.34 +0.1 1.41 perf-profile.calltrace.cycles-pp.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode 2.30 +0.1 2.37 perf-profile.calltrace.cycles-pp.restore_fpregs_from_user.__fpu_restore_sig.fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn 1.38 +0.1 1.46 perf-profile.calltrace.cycles-pp.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode 1.82 +0.1 1.90 perf-profile.calltrace.cycles-pp.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sigtrap 1.58 +0.1 1.67 perf-profile.calltrace.cycles-pp.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3 1.47 +0.1 1.56 perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart 1.64 +0.1 1.73 perf-profile.calltrace.cycles-pp.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64 2.13 +0.1 2.23 perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sigtrap 2.20 +0.1 2.31 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.stress_sigtrap 2.19 +0.1 2.30 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sigtrap 2.12 +0.1 2.24 perf-profile.calltrace.cycles-pp.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap 2.51 +0.1 2.64 perf-profile.calltrace.cycles-pp.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.42 +0.1 3.54 perf-profile.calltrace.cycles-pp.__fpu_restore_sig.fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64 3.50 +0.1 3.63 perf-profile.calltrace.cycles-pp.fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.59 +0.2 7.78 perf-profile.calltrace.cycles-pp.stress_sigtrap_handler 0.15 ±152% +0.4 0.53 perf-profile.calltrace.cycles-pp.__rseq_handle_notify_resume.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64 0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.__rseq_handle_notify_resume.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3 0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.__get_user_nocheck_8.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +0.5 0.54 perf-profile.calltrace.cycles-pp._copy_from_user.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +0.6 0.57 ± 26% perf-profile.calltrace.cycles-pp.save_xstate_epilog.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart 30.20 -0.7 29.47 perf-profile.children.cycles-pp.get_signal 37.21 -0.7 36.51 perf-profile.children.cycles-pp.asm_exc_int3 24.78 -0.7 24.12 perf-profile.children.cycles-pp.do_dec_rlimit_put_ucounts 38.90 -0.6 38.31 perf-profile.children.cycles-pp.arch_do_signal_or_restart 28.53 -0.5 28.03 perf-profile.children.cycles-pp.__send_signal_locked 15.08 -0.5 14.59 perf-profile.children.cycles-pp.force_sig 14.96 -0.5 14.47 perf-profile.children.cycles-pp.force_sig_info_to_task 15.50 -0.5 15.01 perf-profile.children.cycles-pp.exc_int3 25.08 -0.5 24.62 perf-profile.children.cycles-pp.inc_rlimit_get_ucounts 37.92 -0.4 37.48 perf-profile.children.cycles-pp.stress_sigtrap 25.95 -0.4 25.54 perf-profile.children.cycles-pp.__sigqueue_alloc 12.55 -0.4 12.19 perf-profile.children.cycles-pp.collect_signal 20.04 -0.3 19.71 perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 13.35 -0.3 13.04 perf-profile.children.cycles-pp.dequeue_signal 1.79 -0.2 1.56 perf-profile.children.cycles-pp.fpregs_mark_activate 2.02 -0.2 1.85 perf-profile.children.cycles-pp.fpu__clear_user_states 2.09 -0.2 1.93 perf-profile.children.cycles-pp.complete_signal 3.09 -0.1 3.00 perf-profile.children.cycles-pp.set_current_blocked 0.82 -0.1 0.76 perf-profile.children.cycles-pp.get_task_cred 0.05 +0.0 0.06 perf-profile.children.cycles-pp.generic_perform_write 0.24 +0.0 0.26 perf-profile.children.cycles-pp.__put_user_8 0.05 +0.0 0.07 ± 7% perf-profile.children.cycles-pp.shmem_file_write_iter 0.23 ± 2% +0.0 0.25 ± 3% perf-profile.children.cycles-pp.__get_user_8 0.40 +0.0 0.42 perf-profile.children.cycles-pp.__put_user_nocheck_4 0.07 ± 5% +0.0 0.09 ± 4% perf-profile.children.cycles-pp.record__mmap_read_evlist 0.06 +0.0 0.08 perf-profile.children.cycles-pp.record__pushfn 0.34 +0.0 0.36 perf-profile.children.cycles-pp.rseq_update_cpu_node_id 0.06 ± 7% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.perf_mmap__push 0.08 ± 5% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.main 0.08 ± 5% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.run_builtin 0.36 +0.0 0.38 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.06 +0.0 0.08 ± 5% perf-profile.children.cycles-pp.writen 0.29 ± 2% +0.0 0.32 ± 2% perf-profile.children.cycles-pp.rseq_get_rseq_cs 0.07 ± 6% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.__cmd_record 0.07 ± 6% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.cmd_record 0.53 +0.0 0.56 perf-profile.children.cycles-pp.__get_user_nocheck_8 0.55 +0.0 0.58 perf-profile.children.cycles-pp.__getpid 0.58 +0.0 0.62 perf-profile.children.cycles-pp.restore_altstack 0.53 +0.0 0.57 perf-profile.children.cycles-pp.__get_user_nocheck_4 0.68 +0.0 0.72 perf-profile.children.cycles-pp.kmem_cache_free 0.67 +0.0 0.71 perf-profile.children.cycles-pp.check_xstate_in_sigframe 0.79 +0.0 0.83 ± 2% perf-profile.children.cycles-pp.kmem_cache_alloc 0.63 +0.0 0.67 perf-profile.children.cycles-pp.rseq_ip_fixup 0.77 +0.0 0.82 perf-profile.children.cycles-pp.sync_regs 0.23 ± 2% +0.1 0.28 perf-profile.children.cycles-pp.prepare_signal 1.00 +0.1 1.05 perf-profile.children.cycles-pp.save_xstate_epilog 1.00 +0.1 1.07 perf-profile.children.cycles-pp.__rseq_handle_notify_resume 2.32 +0.1 2.39 perf-profile.children.cycles-pp.restore_fpregs_from_user 0.93 +0.1 1.02 perf-profile.children.cycles-pp._copy_from_user 1.52 +0.1 1.61 perf-profile.children.cycles-pp.copy_fpstate_to_sigframe 6.45 +0.1 6.54 perf-profile.children.cycles-pp.handle_signal 3.45 +0.1 3.57 perf-profile.children.cycles-pp.__fpu_restore_sig 3.51 +0.1 3.64 perf-profile.children.cycles-pp.fpu__restore_sig 2.75 +0.2 2.91 perf-profile.children.cycles-pp.get_sigframe 3.27 +0.2 3.45 perf-profile.children.cycles-pp.x64_setup_rt_frame 8.65 +0.2 8.84 perf-profile.children.cycles-pp.__x64_sys_rt_sigreturn 2.17 +0.2 2.36 perf-profile.children.cycles-pp.native_irq_return_iret 4.35 +0.2 4.56 perf-profile.children.cycles-pp.restore_sigcontext 58.30 +0.4 58.70 perf-profile.children.cycles-pp.do_syscall_64 58.48 +0.4 58.89 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 24.78 -0.7 24.12 perf-profile.self.cycles-pp.do_dec_rlimit_put_ucounts 25.07 -0.5 24.62 perf-profile.self.cycles-pp.inc_rlimit_get_ucounts 1.72 -0.2 1.49 perf-profile.self.cycles-pp.fpregs_mark_activate 1.97 -0.2 1.80 perf-profile.self.cycles-pp.complete_signal 0.81 -0.1 0.75 perf-profile.self.cycles-pp.get_task_cred 0.08 ± 5% -0.0 0.06 ± 4% perf-profile.self.cycles-pp.force_sig_info_to_task 0.22 +0.0 0.23 perf-profile.self.cycles-pp.__send_signal_locked 0.16 ± 3% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.get_sigframe 0.18 ± 2% +0.0 0.20 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.27 ± 2% +0.0 0.29 perf-profile.self.cycles-pp.mod_objcg_state 0.33 +0.0 0.35 perf-profile.self.cycles-pp.restore_sigcontext 0.23 +0.0 0.25 perf-profile.self.cycles-pp.__put_user_8 0.28 +0.0 0.30 perf-profile.self.cycles-pp.kmem_cache_alloc 0.22 ± 2% +0.0 0.24 ± 3% perf-profile.self.cycles-pp.__get_user_8 0.47 +0.0 0.49 perf-profile.self.cycles-pp.__fpu_restore_sig 0.34 +0.0 0.36 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.33 +0.0 0.35 perf-profile.self.cycles-pp.rseq_update_cpu_node_id 0.37 +0.0 0.39 perf-profile.self.cycles-pp.__put_user_nocheck_4 0.36 +0.0 0.38 perf-profile.self.cycles-pp.save_xstate_epilog 0.39 +0.0 0.41 perf-profile.self.cycles-pp.check_xstate_in_sigframe 0.51 +0.0 0.54 perf-profile.self.cycles-pp.__get_user_nocheck_8 0.52 +0.0 0.55 perf-profile.self.cycles-pp.x64_setup_rt_frame 0.52 +0.0 0.55 perf-profile.self.cycles-pp.__get_user_nocheck_4 0.76 +0.0 0.81 perf-profile.self.cycles-pp.sync_regs 0.21 +0.1 0.26 perf-profile.self.cycles-pp.prepare_signal 0.96 +0.1 1.01 perf-profile.self.cycles-pp.fpu__clear_user_states 1.12 +0.1 1.19 perf-profile.self.cycles-pp.stress_sigtrap 1.36 +0.1 1.44 perf-profile.self.cycles-pp.copy_fpstate_to_sigframe 1.60 +0.1 1.68 perf-profile.self.cycles-pp.restore_fpregs_from_user 0.91 +0.1 1.00 perf-profile.self.cycles-pp._copy_from_user 2.17 +0.2 2.36 perf-profile.self.cycles-pp.native_irq_return_iret Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki