Hello, kernel test robot noticed a 80.5% improvement of stress-ng.getrandom.ops_per_sec on: commit: 470a8ed1624a45a74176a786e28fac3234c71424 ("[PATCH] random: add chacha8_block and swtich the rng to it") url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Toponce/random-add-chacha8_block-and-swtich-the-rng-to-it/20240430-130757 base: https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git master patch link: https://lore.kernel.org/all/20240429134942.2873253-1-aaron.toponce@xxxxxxxxx/ patch subject: [PATCH] random: add chacha8_block and swtich the rng to it testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s test: getrandom cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240508/202405081501.e1c083b0-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/getrandom/stress-ng/60s commit: ed265f7fd9 ("crypto: x86/aes-gcm - simplify GCM hash subkey derivation") 470a8ed162 ("random: add chacha8_block and swtich the rng to it") ed265f7fd9a635d7 470a8ed1624a45a74176a786e28 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.793e+09 +80.7% 3.239e+09 stress-ng.getrandom.getrandom_bits_per_sec 1.054e+08 +80.5% 1.901e+08 stress-ng.getrandom.ops 1755950 +80.5% 3168792 stress-ng.getrandom.ops_per_sec 13.18 +74.9% 23.05 stress-ng.time.user_time 1.088e+10 +52.5% 1.66e+10 perf-stat.i.branch-instructions 0.29 ± 8% -0.1 0.20 ± 7% perf-stat.i.branch-miss-rate% 0.57 +7.2% 0.61 perf-stat.i.cpi 3.411e+11 -6.7% 3.182e+11 perf-stat.i.instructions 1.75 -6.7% 1.63 perf-stat.i.ipc 0.29 ± 8% -0.1 0.20 ± 7% perf-stat.overall.branch-miss-rate% 0.57 +7.2% 0.61 perf-stat.overall.cpi 1.75 -6.7% 1.64 perf-stat.overall.ipc 1.07e+10 +52.6% 1.633e+10 perf-stat.ps.branch-instructions 3.355e+11 -6.7% 3.13e+11 perf-stat.ps.instructions 2.049e+13 -6.3% 1.919e+13 perf-stat.total.instructions 74.33 -18.9 55.41 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64 83.70 -10.8 72.88 perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe 97.41 -1.3 96.15 perf-profile.calltrace.cycles-pp.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom 98.10 -0.7 97.41 perf-profile.calltrace.cycles-pp.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom 98.19 -0.6 97.55 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom 98.23 -0.6 97.61 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getrandom 98.43 -0.4 97.99 perf-profile.calltrace.cycles-pp.getrandom 1.30 -0.2 1.14 perf-profile.calltrace.cycles-pp.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.__x64_sys_getrandom 1.56 +0.0 1.58 perf-profile.calltrace.cycles-pp.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64 1.62 +0.1 1.69 perf-profile.calltrace.cycles-pp.crng_make_state.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.05 +0.2 1.26 perf-profile.calltrace.cycles-pp.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getentropy 1.07 +0.2 1.30 perf-profile.calltrace.cycles-pp.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getentropy 1.13 +0.3 1.40 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getentropy 1.16 +0.3 1.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getentropy 1.31 +0.4 1.73 perf-profile.calltrace.cycles-pp.getentropy 11.88 +9.5 21.40 perf-profile.calltrace.cycles-pp._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe 75.73 -19.0 56.70 perf-profile.children.cycles-pp.chacha_permute 85.45 -11.4 74.03 perf-profile.children.cycles-pp.chacha_block_generic 99.14 -0.5 98.63 perf-profile.children.cycles-pp.get_random_bytes_user 99.20 -0.5 98.73 perf-profile.children.cycles-pp.__x64_sys_getrandom 98.52 -0.4 98.13 perf-profile.children.cycles-pp.getrandom 99.45 -0.4 99.07 perf-profile.children.cycles-pp.do_syscall_64 99.49 -0.3 99.14 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.44 ± 4% -0.0 0.40 ± 6% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.46 ± 4% -0.0 0.42 ± 6% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.42 ± 4% -0.0 0.38 ± 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.42 ± 4% -0.0 0.38 ± 6% perf-profile.children.cycles-pp.hrtimer_interrupt 0.24 ± 7% -0.0 0.20 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler 0.25 ± 7% -0.0 0.21 ± 6% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.12 ± 4% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.scheduler_tick 1.56 +0.0 1.60 perf-profile.children.cycles-pp.crng_fast_key_erasure 0.03 ± 70% +0.0 0.08 perf-profile.children.cycles-pp.stress_getrandom 0.09 +0.1 0.14 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.00 +0.1 0.07 perf-profile.children.cycles-pp.__memcpy 1.62 +0.1 1.70 perf-profile.children.cycles-pp.crng_make_state 0.00 +0.1 0.08 ± 6% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.13 ± 3% +0.1 0.25 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.17 +0.1 0.31 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64 1.37 +0.5 1.83 perf-profile.children.cycles-pp.getentropy 12.20 +9.8 21.97 perf-profile.children.cycles-pp._copy_to_iter 75.17 -19.1 56.06 perf-profile.self.cycles-pp.chacha_permute 0.05 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.05 ± 7% +0.0 0.09 perf-profile.self.cycles-pp.crng_make_state 0.07 +0.1 0.13 ± 3% perf-profile.self.cycles-pp.do_syscall_64 0.06 ± 6% +0.1 0.12 perf-profile.self.cycles-pp.getentropy 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__memcpy 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.stress_getrandom 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.00 +0.1 0.07 ± 7% perf-profile.self.cycles-pp.__x64_sys_getrandom 0.09 ± 5% +0.1 0.17 ± 2% perf-profile.self.cycles-pp.getrandom 0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.13 +0.1 0.24 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.23 ± 2% +0.2 0.39 perf-profile.self.cycles-pp.crng_fast_key_erasure 1.81 +1.4 3.23 perf-profile.self.cycles-pp.get_random_bytes_user 9.46 +7.4 16.86 perf-profile.self.cycles-pp.chacha_block_generic 11.93 +9.6 21.49 perf-profile.self.cycles-pp._copy_to_iter Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki