[linus:master] [crypto] e787060bdf: stress-ng.sigtrap.ops_per_sec 5.7% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 5.7% improvement of stress-ng.sigtrap.ops_per_sec on:


commit: e787060bdfa35f8b40ef4d277a345ee35b41039f ("crypto: x86/aes-xts - wire up VAES + AVX2 implementation")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sigtrap
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240531/202405311430.e1f484a4-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sigtrap/stress-ng/60s

commit: 
  996f4dcbd2 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation")
  e787060bdf ("crypto: x86/aes-xts - wire up VAES + AVX2 implementation")

996f4dcbd231ec02 e787060bdfa35f8b40ef4d277a3 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     11005 ±  5%     -18.0%       9022 ±  5%  perf-c2c.DRAM.remote
      4834 ±  7%     -23.8%       3684 ±  5%  perf-c2c.HITM.remote
      6143 ± 48%    +101.7%      12390 ± 27%  proc-vmstat.numa_hint_faults
      4427 ± 35%     +56.7%       6939 ±  6%  proc-vmstat.numa_hint_faults_local
    301865            +2.3%     308839        proc-vmstat.pgfault
      5597            -6.4%       5240        stress-ng.sigtrap.nanosecs_to_handle_SIGTRAP
 6.075e+08            +5.7%  6.418e+08        stress-ng.sigtrap.ops
  10124240            +5.7%   10696368        stress-ng.sigtrap.ops_per_sec
    177.72            +6.9%     190.03        stress-ng.time.user_time
      0.53           -17.3%       0.43        perf-stat.i.MPKI
 7.911e+09            +5.1%  8.314e+09        perf-stat.i.branch-instructions
     32.57            -5.5       27.10        perf-stat.i.cache-miss-rate%
  22467086           -13.2%   19505617        perf-stat.i.cache-misses
  69342308            +4.1%   72189549        perf-stat.i.cache-references
      5.26            -5.0%       5.00        perf-stat.i.cpi
     10083           +15.5%      11642        perf-stat.i.cycles-between-cache-misses
 4.275e+10            +5.1%  4.495e+10        perf-stat.i.instructions
      0.20            +5.1%       0.21        perf-stat.i.ipc
      3976            +3.7%       4122        perf-stat.i.minor-faults
      3976            +3.7%       4122        perf-stat.i.page-faults
      0.53           -17.5%       0.43        perf-stat.overall.MPKI
      0.78 ±  3%      -0.0        0.73        perf-stat.overall.branch-miss-rate%
     32.29            -5.3       26.95        perf-stat.overall.cache-miss-rate%
      5.29            -4.9%       5.03        perf-stat.overall.cpi
     10068           +15.2%      11596        perf-stat.overall.cycles-between-cache-misses
      0.19            +5.2%       0.20        perf-stat.overall.ipc
 7.772e+09            +5.1%   8.17e+09        perf-stat.ps.branch-instructions
  22071930           -13.2%   19162218        perf-stat.ps.cache-misses
  68355866            +4.0%   71101498        perf-stat.ps.cache-references
 4.201e+10            +5.2%  4.418e+10        perf-stat.ps.instructions
      3892            +3.7%       4035        perf-stat.ps.minor-faults
      3892            +3.7%       4035        perf-stat.ps.page-faults
 2.571e+12            +5.0%  2.698e+12        perf-stat.total.instructions
     34.18            -0.6       33.55        perf-profile.calltrace.cycles-pp.asm_exc_int3.stress_sigtrap
     15.07            -0.5       14.58        perf-profile.calltrace.cycles-pp.force_sig.exc_int3.asm_exc_int3.stress_sigtrap
     15.47            -0.5       14.99        perf-profile.calltrace.cycles-pp.exc_int3.asm_exc_int3.stress_sigtrap
     14.94            -0.5       14.46        perf-profile.calltrace.cycles-pp.force_sig_info_to_task.force_sig.exc_int3.asm_exc_int3.stress_sigtrap
     37.71            -0.5       37.26        perf-profile.calltrace.cycles-pp.stress_sigtrap
     14.04            -0.4       13.65        perf-profile.calltrace.cycles-pp.__send_signal_locked.force_sig_info_to_task.force_sig.exc_int3.asm_exc_int3
     14.94            -0.4       14.58        perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.48            -0.4       12.11        perf-profile.calltrace.cycles-pp.do_dec_rlimit_put_ucounts.collect_signal.dequeue_signal.get_signal.arch_do_signal_or_restart
     15.19            -0.4       14.83        perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap
     12.54            -0.4       12.18        perf-profile.calltrace.cycles-pp.collect_signal.dequeue_signal.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode
     13.34            -0.3       13.03        perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
     12.30            -0.3       12.00        perf-profile.calltrace.cycles-pp.do_dec_rlimit_put_ucounts.get_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3
      0.73            -0.3        0.47 ± 33%  perf-profile.calltrace.cycles-pp.complete_signal.__send_signal_locked.force_sig_info_to_task.force_sig.exc_int3
     12.43            -0.2       12.18        perf-profile.calltrace.cycles-pp.inc_rlimit_get_ucounts.__sigqueue_alloc.__send_signal_locked.do_send_sig_info.do_send_specific
     17.48            -0.2       17.25        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap
     17.44            -0.2       17.21        perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap
     12.85            -0.2       12.63        perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal_locked.do_send_sig_info.do_send_specific.__x64_sys_tgkill
     12.64            -0.2       12.44        perf-profile.calltrace.cycles-pp.inc_rlimit_get_ucounts.__sigqueue_alloc.__send_signal_locked.force_sig_info_to_task.force_sig
     13.07            -0.2       12.88        perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal_locked.force_sig_info_to_task.force_sig.exc_int3
      0.73 ±  2%      -0.1        0.61 ±  5%  perf-profile.calltrace.cycles-pp.fpregs_mark_activate.fpu__clear_user_states.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode
      2.50            -0.1        2.40        perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap_handler
      2.55            -0.1        2.45        perf-profile.calltrace.cycles-pp.asm_exc_int3.stress_sigtrap_handler
      2.54            -0.1        2.44        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap_handler
      3.07            -0.1        2.97        perf-profile.calltrace.cycles-pp.set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.37 ±  2%      -0.1        1.28        perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.16            -0.1        1.08 ±  3%  perf-profile.calltrace.cycles-pp.fpu__clear_user_states.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3
      1.24            -0.1        1.16 ±  2%  perf-profile.calltrace.cycles-pp.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap_handler
      0.79            -0.1        0.71        perf-profile.calltrace.cycles-pp.fpu__clear_user_states.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
      0.64            -0.1        0.57        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.force_sig_info_to_task.force_sig.exc_int3.asm_exc_int3
      0.82            -0.1        0.75        perf-profile.calltrace.cycles-pp.get_task_cred.apparmor_task_kill.security_task_kill.do_send_specific.__x64_sys_tgkill
      0.73            +0.0        0.77        perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
      0.78            +0.0        0.82        perf-profile.calltrace.cycles-pp.signal_setup_done.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.52            +0.0        0.57        perf-profile.calltrace.cycles-pp.recalc_sigpending.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.syscall_exit_to_user_mode
      0.75            +0.1        0.80        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_int3.stress_sigtrap
      1.34            +0.1        1.39        perf-profile.calltrace.cycles-pp.complete_signal.__send_signal_locked.do_send_sig_info.do_send_specific.__x64_sys_tgkill
      3.06            +0.1        3.12        perf-profile.calltrace.cycles-pp.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.34            +0.1        1.41        perf-profile.calltrace.cycles-pp.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode
      2.30            +0.1        2.37        perf-profile.calltrace.cycles-pp.restore_fpregs_from_user.__fpu_restore_sig.fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn
      1.38            +0.1        1.46        perf-profile.calltrace.cycles-pp.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode
      1.82            +0.1        1.90        perf-profile.calltrace.cycles-pp.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sigtrap
      1.58            +0.1        1.67        perf-profile.calltrace.cycles-pp.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3
      1.47            +0.1        1.56        perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart
      1.64            +0.1        1.73        perf-profile.calltrace.cycles-pp.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
      2.13            +0.1        2.23        perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sigtrap
      2.20            +0.1        2.31        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.stress_sigtrap
      2.19            +0.1        2.30        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sigtrap
      2.12            +0.1        2.24        perf-profile.calltrace.cycles-pp.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3.stress_sigtrap
      2.51            +0.1        2.64        perf-profile.calltrace.cycles-pp.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.42            +0.1        3.54        perf-profile.calltrace.cycles-pp.__fpu_restore_sig.fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64
      3.50            +0.1        3.63        perf-profile.calltrace.cycles-pp.fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.59            +0.2        7.78        perf-profile.calltrace.cycles-pp.stress_sigtrap_handler
      0.15 ±152%      +0.4        0.53        perf-profile.calltrace.cycles-pp.__rseq_handle_notify_resume.handle_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
      0.00            +0.5        0.52        perf-profile.calltrace.cycles-pp.__rseq_handle_notify_resume.handle_signal.arch_do_signal_or_restart.irqentry_exit_to_user_mode.asm_exc_int3
      0.00            +0.5        0.52        perf-profile.calltrace.cycles-pp.__get_user_nocheck_8.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.5        0.54        perf-profile.calltrace.cycles-pp._copy_from_user.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.6        0.57 ± 26%  perf-profile.calltrace.cycles-pp.save_xstate_epilog.get_sigframe.x64_setup_rt_frame.handle_signal.arch_do_signal_or_restart
     30.20            -0.7       29.47        perf-profile.children.cycles-pp.get_signal
     37.21            -0.7       36.51        perf-profile.children.cycles-pp.asm_exc_int3
     24.78            -0.7       24.12        perf-profile.children.cycles-pp.do_dec_rlimit_put_ucounts
     38.90            -0.6       38.31        perf-profile.children.cycles-pp.arch_do_signal_or_restart
     28.53            -0.5       28.03        perf-profile.children.cycles-pp.__send_signal_locked
     15.08            -0.5       14.59        perf-profile.children.cycles-pp.force_sig
     14.96            -0.5       14.47        perf-profile.children.cycles-pp.force_sig_info_to_task
     15.50            -0.5       15.01        perf-profile.children.cycles-pp.exc_int3
     25.08            -0.5       24.62        perf-profile.children.cycles-pp.inc_rlimit_get_ucounts
     37.92            -0.4       37.48        perf-profile.children.cycles-pp.stress_sigtrap
     25.95            -0.4       25.54        perf-profile.children.cycles-pp.__sigqueue_alloc
     12.55            -0.4       12.19        perf-profile.children.cycles-pp.collect_signal
     20.04            -0.3       19.71        perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
     13.35            -0.3       13.04        perf-profile.children.cycles-pp.dequeue_signal
      1.79            -0.2        1.56        perf-profile.children.cycles-pp.fpregs_mark_activate
      2.02            -0.2        1.85        perf-profile.children.cycles-pp.fpu__clear_user_states
      2.09            -0.2        1.93        perf-profile.children.cycles-pp.complete_signal
      3.09            -0.1        3.00        perf-profile.children.cycles-pp.set_current_blocked
      0.82            -0.1        0.76        perf-profile.children.cycles-pp.get_task_cred
      0.05            +0.0        0.06        perf-profile.children.cycles-pp.generic_perform_write
      0.24            +0.0        0.26        perf-profile.children.cycles-pp.__put_user_8
      0.05            +0.0        0.07 ±  7%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.23 ±  2%      +0.0        0.25 ±  3%  perf-profile.children.cycles-pp.__get_user_8
      0.40            +0.0        0.42        perf-profile.children.cycles-pp.__put_user_nocheck_4
      0.07 ±  5%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.06            +0.0        0.08        perf-profile.children.cycles-pp.record__pushfn
      0.34            +0.0        0.36        perf-profile.children.cycles-pp.rseq_update_cpu_node_id
      0.06 ±  7%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.perf_mmap__push
      0.08 ±  5%      +0.0        0.10 ±  3%  perf-profile.children.cycles-pp.main
      0.08 ±  5%      +0.0        0.10 ±  3%  perf-profile.children.cycles-pp.run_builtin
      0.36            +0.0        0.38        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.06            +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.writen
      0.29 ±  2%      +0.0        0.32 ±  2%  perf-profile.children.cycles-pp.rseq_get_rseq_cs
      0.07 ±  6%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.__cmd_record
      0.07 ±  6%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.cmd_record
      0.53            +0.0        0.56        perf-profile.children.cycles-pp.__get_user_nocheck_8
      0.55            +0.0        0.58        perf-profile.children.cycles-pp.__getpid
      0.58            +0.0        0.62        perf-profile.children.cycles-pp.restore_altstack
      0.53            +0.0        0.57        perf-profile.children.cycles-pp.__get_user_nocheck_4
      0.68            +0.0        0.72        perf-profile.children.cycles-pp.kmem_cache_free
      0.67            +0.0        0.71        perf-profile.children.cycles-pp.check_xstate_in_sigframe
      0.79            +0.0        0.83 ±  2%  perf-profile.children.cycles-pp.kmem_cache_alloc
      0.63            +0.0        0.67        perf-profile.children.cycles-pp.rseq_ip_fixup
      0.77            +0.0        0.82        perf-profile.children.cycles-pp.sync_regs
      0.23 ±  2%      +0.1        0.28        perf-profile.children.cycles-pp.prepare_signal
      1.00            +0.1        1.05        perf-profile.children.cycles-pp.save_xstate_epilog
      1.00            +0.1        1.07        perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      2.32            +0.1        2.39        perf-profile.children.cycles-pp.restore_fpregs_from_user
      0.93            +0.1        1.02        perf-profile.children.cycles-pp._copy_from_user
      1.52            +0.1        1.61        perf-profile.children.cycles-pp.copy_fpstate_to_sigframe
      6.45            +0.1        6.54        perf-profile.children.cycles-pp.handle_signal
      3.45            +0.1        3.57        perf-profile.children.cycles-pp.__fpu_restore_sig
      3.51            +0.1        3.64        perf-profile.children.cycles-pp.fpu__restore_sig
      2.75            +0.2        2.91        perf-profile.children.cycles-pp.get_sigframe
      3.27            +0.2        3.45        perf-profile.children.cycles-pp.x64_setup_rt_frame
      8.65            +0.2        8.84        perf-profile.children.cycles-pp.__x64_sys_rt_sigreturn
      2.17            +0.2        2.36        perf-profile.children.cycles-pp.native_irq_return_iret
      4.35            +0.2        4.56        perf-profile.children.cycles-pp.restore_sigcontext
     58.30            +0.4       58.70        perf-profile.children.cycles-pp.do_syscall_64
     58.48            +0.4       58.89        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     24.78            -0.7       24.12        perf-profile.self.cycles-pp.do_dec_rlimit_put_ucounts
     25.07            -0.5       24.62        perf-profile.self.cycles-pp.inc_rlimit_get_ucounts
      1.72            -0.2        1.49        perf-profile.self.cycles-pp.fpregs_mark_activate
      1.97            -0.2        1.80        perf-profile.self.cycles-pp.complete_signal
      0.81            -0.1        0.75        perf-profile.self.cycles-pp.get_task_cred
      0.08 ±  5%      -0.0        0.06 ±  4%  perf-profile.self.cycles-pp.force_sig_info_to_task
      0.22            +0.0        0.23        perf-profile.self.cycles-pp.__send_signal_locked
      0.16 ±  3%      +0.0        0.18 ±  2%  perf-profile.self.cycles-pp.get_sigframe
      0.18 ±  2%      +0.0        0.20 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.27 ±  2%      +0.0        0.29        perf-profile.self.cycles-pp.mod_objcg_state
      0.33            +0.0        0.35        perf-profile.self.cycles-pp.restore_sigcontext
      0.23            +0.0        0.25        perf-profile.self.cycles-pp.__put_user_8
      0.28            +0.0        0.30        perf-profile.self.cycles-pp.kmem_cache_alloc
      0.22 ±  2%      +0.0        0.24 ±  3%  perf-profile.self.cycles-pp.__get_user_8
      0.47            +0.0        0.49        perf-profile.self.cycles-pp.__fpu_restore_sig
      0.34            +0.0        0.36        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.33            +0.0        0.35        perf-profile.self.cycles-pp.rseq_update_cpu_node_id
      0.37            +0.0        0.39        perf-profile.self.cycles-pp.__put_user_nocheck_4
      0.36            +0.0        0.38        perf-profile.self.cycles-pp.save_xstate_epilog
      0.39            +0.0        0.41        perf-profile.self.cycles-pp.check_xstate_in_sigframe
      0.51            +0.0        0.54        perf-profile.self.cycles-pp.__get_user_nocheck_8
      0.52            +0.0        0.55        perf-profile.self.cycles-pp.x64_setup_rt_frame
      0.52            +0.0        0.55        perf-profile.self.cycles-pp.__get_user_nocheck_4
      0.76            +0.0        0.81        perf-profile.self.cycles-pp.sync_regs
      0.21            +0.1        0.26        perf-profile.self.cycles-pp.prepare_signal
      0.96            +0.1        1.01        perf-profile.self.cycles-pp.fpu__clear_user_states
      1.12            +0.1        1.19        perf-profile.self.cycles-pp.stress_sigtrap
      1.36            +0.1        1.44        perf-profile.self.cycles-pp.copy_fpstate_to_sigframe
      1.60            +0.1        1.68        perf-profile.self.cycles-pp.restore_fpregs_from_user
      0.91            +0.1        1.00        perf-profile.self.cycles-pp._copy_from_user
      2.17            +0.2        2.36        perf-profile.self.cycles-pp.native_irq_return_iret




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux