[linus:master] [crypto] 996f4dcbd2: stress-ng.sockfd.ops_per_sec 11.0% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 11.0% improvement of stress-ng.sockfd.ops_per_sec on:


commit: 996f4dcbd231ec022f38a3c27e7fc45727e4e875 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockfd
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/202405271558.f424aa27-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockfd/stress-ng/60s

commit: 
  d637168810 ("crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs")
  996f4dcbd2 ("crypto: x86/aes-xts - wire up AESNI + AVX implementation")

d6371688101223a3 996f4dcbd231ec022f38a3c27e7 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    730290 ± 11%     +26.2%     921636 ± 13%  meminfo.Mapped
     24673 ±  2%      +8.1%      26682 ±  3%  perf-c2c.HITM.total
     61893            +3.6%      64151        vmstat.system.cs
      0.28 ±  8%     -11.6%       0.25 ±  9%  sched_debug.cfs_rq:/.h_nr_running.stddev
    196.71 ±  6%     -11.8%     173.53 ± 11%  sched_debug.cfs_rq:/.util_est.stddev
  46304617           +11.0%   51404735        stress-ng.sockfd.ops
    771591           +11.0%     856468        stress-ng.sockfd.ops_per_sec
   2336146            -3.1%    2263883        stress-ng.time.involuntary_context_switches
   1365039 ±  2%     +15.8%    1580362        stress-ng.time.voluntary_context_switches
    183309 ± 11%     +26.1%     231069 ± 13%  proc-vmstat.nr_mapped
   1843540            +2.4%    1888288        proc-vmstat.numa_hit
   1611479            +2.8%    1656095        proc-vmstat.numa_local
   2952001 ±  3%      +5.1%    3103307        proc-vmstat.pgalloc_normal
   2282989 ±  4%      +7.5%    2454018 ±  2%  proc-vmstat.pgfree
      0.42 ±  2%      +6.2%       0.44        perf-stat.i.MPKI
 1.487e+10            +1.8%  1.513e+10        perf-stat.i.branch-instructions
  25452853            +9.2%   27794083        perf-stat.i.cache-misses
  85628078            +8.2%   92619680        perf-stat.i.cache-references
     63603 ±  2%      +4.2%      66264        perf-stat.i.context-switches
     10.03            -1.7%       9.86        perf-stat.i.cpi
     26278            -9.1%      23887        perf-stat.i.cycles-between-cache-misses
  6.35e+10            +2.2%  6.488e+10        perf-stat.i.instructions
      0.10            +1.5%       0.10        perf-stat.i.ipc
      0.06 ± 46%    +140.8%       0.14 ± 40%  perf-stat.i.major-faults
      0.40            +7.8%       0.43        perf-stat.overall.MPKI
     10.18            -2.0%       9.97        perf-stat.overall.cpi
     25755            -9.1%      23420        perf-stat.overall.cycles-between-cache-misses
      0.10            +2.0%       0.10        perf-stat.overall.ipc
 1.423e+10            +1.3%  1.442e+10        perf-stat.ps.branch-instructions
  49502972        +8.1e+05%  3.998e+11 ±223%  perf-stat.ps.branch-misses
  23994964            +9.7%   26326983        perf-stat.ps.cache-misses
  82930036            +8.2%   89756764        perf-stat.ps.cache-references
     61357            +3.3%      63381        perf-stat.ps.context-switches
 6.072e+10            +1.8%   6.18e+10        perf-stat.ps.instructions
      0.04 ± 46%    +137.6%       0.11 ± 37%  perf-stat.ps.major-faults
 3.653e+12            +1.6%  3.712e+12        perf-stat.total.instructions
     48.81            -0.2       48.59        perf-profile.calltrace.cycles-pp.unix_inflight.unix_scm_to_skb.unix_stream_sendmsg.____sys_sendmsg.___sys_sendmsg
     48.45            -0.2       48.29        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg
     48.57            -0.2       48.42        perf-profile.calltrace.cycles-pp.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg
     48.53            -0.2       48.38        perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_notinflight.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
     49.02            -0.1       48.88        perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg
     49.01            -0.1       48.87        perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg.___sys_recvmsg
     49.02            -0.1       48.88        perf-profile.calltrace.cycles-pp.sock_recvmsg.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64
     49.04            -0.1       48.90        perf-profile.calltrace.cycles-pp.____sys_recvmsg.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
     49.11            -0.1       48.97        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd
     49.14            -0.1       49.00        perf-profile.calltrace.cycles-pp.recvmsg.stress_sockfd
     49.08            -0.1       48.95        perf-profile.calltrace.cycles-pp.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd
     49.07            -0.1       48.94        perf-profile.calltrace.cycles-pp.___sys_recvmsg.__sys_recvmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvmsg
     49.11            -0.1       48.98        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvmsg.stress_sockfd
      0.56 ±  3%      +0.1        0.65 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
      0.56 ±  3%      +0.1        0.65 ±  3%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      0.58 ±  2%      +0.1        0.67 ±  3%  perf-profile.calltrace.cycles-pp.open64
      0.55 ±  3%      +0.1        0.64 ±  3%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      0.56 ±  3%      +0.1        0.65 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      0.17 ±141%      +0.4        0.57 ±  3%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
      0.17 ±141%      +0.4        0.58 ±  4%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.19            -0.4       96.83        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     97.49            -0.4       97.13        perf-profile.children.cycles-pp._raw_spin_lock
     48.81            -0.2       48.59        perf-profile.children.cycles-pp.unix_inflight
     48.57            -0.2       48.42        perf-profile.children.cycles-pp.unix_notinflight
     49.02            -0.1       48.88        perf-profile.children.cycles-pp.unix_stream_read_generic
     49.02            -0.1       48.88        perf-profile.children.cycles-pp.unix_stream_recvmsg
     49.03            -0.1       48.89        perf-profile.children.cycles-pp.sock_recvmsg
     49.04            -0.1       48.90        perf-profile.children.cycles-pp.____sys_recvmsg
     49.09            -0.1       48.95        perf-profile.children.cycles-pp.__sys_recvmsg
     49.07            -0.1       48.94        perf-profile.children.cycles-pp.___sys_recvmsg
     49.15            -0.1       49.02        perf-profile.children.cycles-pp.recvmsg
      0.09            +0.0        0.10        perf-profile.children.cycles-pp.__memcg_slab_free_hook
      0.06            +0.0        0.07        perf-profile.children.cycles-pp.sock_alloc_send_pskb
      0.12 ±  3%      +0.0        0.14 ±  2%  perf-profile.children.cycles-pp.alloc_empty_file
      0.07 ±  8%      +0.0        0.09 ± 11%  perf-profile.children.cycles-pp.dput
      0.07 ± 11%      +0.0        0.09 ± 12%  perf-profile.children.cycles-pp.lockref_put_return
      0.12 ±  3%      +0.0        0.15 ±  7%  perf-profile.children.cycles-pp.__fput
      0.22 ±  2%      +0.0        0.24 ±  5%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.17 ±  2%      +0.0        0.20 ±  6%  perf-profile.children.cycles-pp.task_work_run
      0.18 ±  3%      +0.0        0.21 ±  6%  perf-profile.children.cycles-pp.syscall
      0.17 ±  6%      +0.0        0.21 ±  7%  perf-profile.children.cycles-pp.do_dentry_open
      0.02 ±141%      +0.0        0.06 ± 28%  perf-profile.children.cycles-pp.generic_perform_write
      0.09 ±  5%      +0.0        0.14 ± 37%  perf-profile.children.cycles-pp.cmd_record
      0.09 ±  5%      +0.0        0.14 ± 37%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.26 ±  5%      +0.0        0.31 ±  6%  perf-profile.children.cycles-pp.do_open
      0.08 ±  8%      +0.0        0.13 ± 34%  perf-profile.children.cycles-pp.perf_mmap__push
      0.09 ±  5%      +0.1        0.14 ± 35%  perf-profile.children.cycles-pp.main
      0.09 ±  5%      +0.1        0.14 ± 35%  perf-profile.children.cycles-pp.run_builtin
      0.50 ±  3%      +0.1        0.58 ±  3%  perf-profile.children.cycles-pp.do_filp_open
      0.49 ±  3%      +0.1        0.57 ±  4%  perf-profile.children.cycles-pp.path_openat
      0.56 ±  3%      +0.1        0.64 ±  3%  perf-profile.children.cycles-pp.do_sys_openat2
      0.56 ±  3%      +0.1        0.65 ±  3%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.59 ±  2%      +0.1        0.69 ±  3%  perf-profile.children.cycles-pp.open64
     96.75            -0.3       96.40        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.07 ± 11%      +0.0        0.09 ± 12%  perf-profile.self.cycles-pp.lockref_put_return




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux