Hello, kernel test robot noticed a -7.9% regression of stress-ng.mutex.ops_per_sec on: commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s test: mutex cpufreq_governor: performance In addition to that, the commit also has impact on the following tests: +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | stress-ng: stress-ng.ptrace.ops_per_sec -3.9% regression | | test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory | | test parameters | cpufreq_governor=performance | | | nr_threads=100% | | | test=ptrace | | | testtime=60s | +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | stress-ng: stress-ng.getdent.ops_per_sec 5.8% improvement | | test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory | | test parameters | cpufreq_governor=performance | | | disk=1HDD | | | fs=btrfs | | | nr_threads=100% | | | test=getdent | | | testtime=60s | +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_thread_ops 4.0% improvement | | test machine | 104 threads 2 sockets (Skylake) with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=thread | | | nr_task=100% | | | test=futex4 | +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_process_ops -2.1% regression | | test machine | 104 threads 2 sockets (Skylake) with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=100% | | | test=futex2 | +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_process_ops 3.7% improvement | | test machine | 104 threads 2 sockets (Skylake) with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=100% | | | test=futex3 | +------------------+-------------------------------------------------------------------------------------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202403041300.a7fb1462-yujie.liu@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240304/202403041300.a7fb1462-yujie.liu@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mutex/stress-ng/60s commit: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition") 6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 ---------------- --------------------------- %stddev %change %stddev \ | \ 11556 ± 15% -24.2% 8755 ± 10% numa-meminfo.node0.Active 11529 ± 15% -24.5% 8702 ± 10% numa-meminfo.node0.Active(anon) 417861 -8.0% 384591 vmstat.system.cs 287897 -5.2% 273070 vmstat.system.in 182670 +9.0% 199032 stress-ng.mutex.nanosecs_per_mutex 18139421 -7.9% 16702171 stress-ng.mutex.ops 302318 -7.9% 278364 stress-ng.mutex.ops_per_sec 12040142 -7.3% 11161921 stress-ng.time.involuntary_context_switches 9424624 -7.6% 8707796 stress-ng.time.voluntary_context_switches 1.36 -5.7% 1.28 perf-stat.i.MPKI 0.31 -0.0 0.30 perf-stat.i.branch-miss-rate% 11445088 -4.4% 10944702 perf-stat.i.branch-misses 21081580 -6.7% 19679133 perf-stat.i.cache-misses 57754062 -6.7% 53909365 perf-stat.i.cache-references 429726 -7.6% 397018 perf-stat.i.context-switches 120047 -7.3% 111272 perf-stat.i.cpu-migrations 9063 +7.3% 9727 perf-stat.i.cycles-between-cache-misses 8.62 -7.5% 7.97 perf-stat.i.metric.K/sec 1.35 -5.9% 1.27 perf-stat.overall.MPKI 0.31 -0.0 0.30 perf-stat.overall.branch-miss-rate% 8893 +7.0% 9514 perf-stat.overall.cycles-between-cache-misses 11240262 -4.4% 10751121 perf-stat.ps.branch-misses 20680093 -6.7% 19302166 perf-stat.ps.cache-misses 56715466 -6.7% 52937829 perf-stat.ps.cache-references 422630 -7.6% 390583 perf-stat.ps.context-switches 118070 -7.3% 109477 perf-stat.ps.cpu-migrations 10.01 -0.5 9.54 perf-profile.calltrace.cycles-pp.find_lock_lowest_rq.push_rt_task.push_rt_tasks.finish_task_switch.__schedule 20.36 -0.3 20.04 perf-profile.calltrace.cycles-pp.push_rt_task.push_rt_tasks.finish_task_switch.__schedule.schedule 21.10 -0.3 20.84 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 21.08 -0.3 20.83 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 21.86 -0.3 21.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 21.85 -0.2 21.60 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 17.30 -0.2 17.07 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64 17.32 -0.2 17.09 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 17.15 -0.2 16.93 perf-profile.calltrace.cycles-pp.futex_wait_queue.__futex_wait.futex_wait.do_futex.__x64_sys_futex 17.11 -0.2 16.88 perf-profile.calltrace.cycles-pp.schedule.futex_wait_queue.__futex_wait.futex_wait.do_futex 17.10 -0.2 16.88 perf-profile.calltrace.cycles-pp.__schedule.schedule.futex_wait_queue.__futex_wait.futex_wait 4.16 -0.2 3.98 perf-profile.calltrace.cycles-pp.__sched_yield 3.72 -0.2 3.54 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.73 -0.2 3.55 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 4.10 -0.2 3.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield 4.09 -0.2 3.91 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 3.99 -0.2 3.81 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 15.61 -0.1 15.46 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.futex_wait_queue.__futex_wait 2.64 -0.1 2.50 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.__x64_sys_sched_yield 14.86 -0.1 14.72 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.futex_wait_queue 2.95 -0.1 2.81 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64 0.92 ± 3% -0.1 0.82 ± 2% perf-profile.calltrace.cycles-pp.cpupri_set.enqueue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 0.72 ± 4% -0.1 0.65 ± 2% perf-profile.calltrace.cycles-pp.cpupri_set.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler 0.74 ± 4% -0.1 0.66 ± 2% perf-profile.calltrace.cycles-pp.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 1.07 -0.0 1.04 perf-profile.calltrace.cycles-pp.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 3.76 -0.0 3.73 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.01 -0.0 0.98 perf-profile.calltrace.cycles-pp._raw_spin_lock.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 0.59 +0.0 0.62 perf-profile.calltrace.cycles-pp.rt_mutex_adjust_pi.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 0.68 +0.0 0.71 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule_idle.do_idle 0.69 +0.0 0.72 perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule_idle.do_idle.cpu_startup_entry 5.78 +0.1 5.84 perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 5.86 +0.1 5.92 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 5.78 +0.1 5.84 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler 5.78 +0.1 5.84 perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler 6.87 +0.1 6.94 perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single 2.26 +0.1 2.34 perf-profile.calltrace.cycles-pp._raw_spin_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 2.26 +0.1 2.33 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 8.53 +0.1 8.61 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 5.44 +0.1 5.52 perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single 8.45 +0.1 8.58 perf-profile.calltrace.cycles-pp.enqueue_task_rt.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue 8.26 +0.1 8.39 perf-profile.calltrace.cycles-pp._raw_spin_lock.enqueue_task_rt.activate_task.ttwu_do_activate.sched_ttwu_pending 10.00 +0.2 10.16 perf-profile.calltrace.cycles-pp.activate_task.push_rt_task.push_rt_tasks.finish_task_switch.__schedule 10.11 +0.2 10.27 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.enqueue_task_rt.activate_task.ttwu_do_activate 9.69 +0.2 9.86 perf-profile.calltrace.cycles-pp.enqueue_task_rt.activate_task.push_rt_task.push_rt_tasks.finish_task_switch 9.30 +0.2 9.48 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.enqueue_task_rt.activate_task.push_rt_task 9.38 +0.2 9.56 perf-profile.calltrace.cycles-pp._raw_spin_lock.enqueue_task_rt.activate_task.push_rt_task.push_rt_tasks 47.73 +0.4 48.08 perf-profile.calltrace.cycles-pp.enqueue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler 64.42 +0.4 64.78 perf-profile.calltrace.cycles-pp.__sched_setscheduler 64.32 +0.4 64.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 64.31 +0.4 64.68 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 59.94 +0.4 60.35 perf-profile.calltrace.cycles-pp.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 59.94 +0.4 60.35 perf-profile.calltrace.cycles-pp.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler 59.64 +0.4 60.05 perf-profile.calltrace.cycles-pp.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64 59.82 +0.4 60.24 perf-profile.calltrace.cycles-pp._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe 46.02 +0.4 46.45 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.enqueue_task_rt.__sched_setscheduler._sched_setscheduler 46.37 +0.4 46.82 perf-profile.calltrace.cycles-pp._raw_spin_lock.enqueue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler 11.13 -0.5 10.60 perf-profile.children.cycles-pp.find_lock_lowest_rq 25.80 -0.4 25.37 perf-profile.children.cycles-pp.schedule 27.01 -0.4 26.60 perf-profile.children.cycles-pp.__schedule 22.66 -0.4 22.28 perf-profile.children.cycles-pp.push_rt_task 22.81 -0.3 22.48 perf-profile.children.cycles-pp.finish_task_switch 21.45 -0.3 21.13 perf-profile.children.cycles-pp.push_rt_tasks 21.10 -0.3 20.84 perf-profile.children.cycles-pp.__x64_sys_futex 21.08 -0.3 20.83 perf-profile.children.cycles-pp.do_futex 17.30 -0.2 17.07 perf-profile.children.cycles-pp.__futex_wait 2.20 ± 3% -0.2 1.97 ± 2% perf-profile.children.cycles-pp.cpupri_set 17.32 -0.2 17.09 perf-profile.children.cycles-pp.futex_wait 17.15 -0.2 16.93 perf-profile.children.cycles-pp.futex_wait_queue 4.18 -0.2 3.99 perf-profile.children.cycles-pp.__sched_yield 3.99 -0.2 3.81 perf-profile.children.cycles-pp.__x64_sys_sched_yield 0.94 ± 4% -0.1 0.85 ± 2% perf-profile.children.cycles-pp.dequeue_rt_stack 0.50 ± 5% -0.1 0.43 ± 3% perf-profile.children.cycles-pp.find_lowest_rq 0.46 ± 5% -0.1 0.40 ± 4% perf-profile.children.cycles-pp.cpupri_find_fitness 0.82 -0.0 0.78 perf-profile.children.cycles-pp.task_woken_rt 0.32 ± 2% -0.0 0.28 ± 3% perf-profile.children.cycles-pp.pull_rt_task 0.31 ± 2% -0.0 0.28 ± 2% perf-profile.children.cycles-pp.pick_next_task_rt 0.58 -0.0 0.55 perf-profile.children.cycles-pp.enqueue_pushable_task 3.76 -0.0 3.73 perf-profile.children.cycles-pp.futex_wake 0.11 ± 4% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.balance_rt 0.43 -0.0 0.41 perf-profile.children.cycles-pp.rto_push_irq_work_func 0.14 ± 2% -0.0 0.13 perf-profile.children.cycles-pp.select_task_rq 0.13 ± 2% -0.0 0.12 perf-profile.children.cycles-pp.select_task_rq_rt 0.07 -0.0 0.06 perf-profile.children.cycles-pp.update_rt_rq_load_avg 0.26 +0.0 0.27 perf-profile.children.cycles-pp.irq_exit_rcu 0.59 +0.0 0.62 perf-profile.children.cycles-pp.rt_mutex_adjust_pi 0.49 ± 2% +0.0 0.53 perf-profile.children.cycles-pp.scheduler_tick 1.14 +0.0 1.18 perf-profile.children.cycles-pp.update_curr_rt 0.58 +0.0 0.63 perf-profile.children.cycles-pp.tick_nohz_highres_handler 0.58 +0.0 0.62 perf-profile.children.cycles-pp.update_process_times 0.62 +0.0 0.68 perf-profile.children.cycles-pp.hrtimer_interrupt 0.60 +0.1 0.65 perf-profile.children.cycles-pp.__hrtimer_run_queues 0.58 +0.1 0.63 perf-profile.children.cycles-pp.tick_sched_handle 0.62 +0.1 0.68 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.87 +0.1 0.93 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.94 +0.1 1.00 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 8.53 +0.1 8.61 perf-profile.children.cycles-pp.cpu_startup_entry 8.53 +0.1 8.61 perf-profile.children.cycles-pp.secondary_startup_64_no_verify 8.53 +0.1 8.61 perf-profile.children.cycles-pp.do_idle 14.31 +0.1 14.44 perf-profile.children.cycles-pp.sched_ttwu_pending 13.62 +0.1 13.76 perf-profile.children.cycles-pp.ttwu_do_activate 23.84 +0.3 24.19 perf-profile.children.cycles-pp.activate_task 59.94 +0.4 60.36 perf-profile.children.cycles-pp.__x64_sys_sched_setscheduler 59.94 +0.4 60.35 perf-profile.children.cycles-pp.do_sched_setscheduler 59.82 +0.4 60.24 perf-profile.children.cycles-pp._sched_setscheduler 88.70 +0.5 89.18 perf-profile.children.cycles-pp._raw_spin_lock 87.93 +0.5 88.41 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 71.06 +0.7 71.77 perf-profile.children.cycles-pp.enqueue_task_rt 124.24 +0.8 125.02 perf-profile.children.cycles-pp.__sched_setscheduler 2.19 ± 3% -0.2 1.96 ± 2% perf-profile.self.cycles-pp.cpupri_set 0.31 ± 6% -0.0 0.27 ± 3% perf-profile.self.cycles-pp.cpupri_find_fitness 0.30 ± 3% -0.0 0.27 ± 4% perf-profile.self.cycles-pp.pull_rt_task 0.26 ± 3% -0.0 0.23 ± 3% perf-profile.self.cycles-pp.pick_next_task_rt 0.54 -0.0 0.52 perf-profile.self.cycles-pp.enqueue_pushable_task 0.15 -0.0 0.14 perf-profile.self.cycles-pp.switch_mm_irqs_off 0.65 +0.0 0.67 perf-profile.self.cycles-pp._raw_spin_lock_irqsave 1.00 +0.0 1.04 perf-profile.self.cycles-pp.update_curr_rt 87.92 +0.5 88.40 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath *************************************************************************************************** lkp-icl-2sp7: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/ptrace/stress-ng/60s commit: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition") 6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 ---------------- --------------------------- %stddev %change %stddev \ | \ 1476651 -3.9% 1418563 vmstat.system.cs 46602054 -3.9% 44765510 stress-ng.ptrace.ops 776688 -3.9% 746080 stress-ng.ptrace.ops_per_sec 93178718 -3.9% 89501356 stress-ng.time.voluntary_context_switches 41454 ± 26% -69.0% 12835 ± 93% proc-vmstat.numa_pages_migrated 363994 ± 3% -5.7% 343290 ± 3% proc-vmstat.pgfree 41454 ± 26% -69.0% 12835 ± 93% proc-vmstat.pgmigrate_success 36755 ± 34% -41.9% 21353 ± 30% proc-vmstat.pgreuse 0.70 +0.1 0.75 ± 2% perf-stat.i.branch-miss-rate% 44257013 ± 2% +7.7% 47672895 ± 3% perf-stat.i.branch-misses 1534064 -4.0% 1472825 perf-stat.i.context-switches 24.03 -4.0% 23.08 perf-stat.i.metric.K/sec 0.68 ± 2% +0.1 0.73 ± 2% perf-stat.overall.branch-miss-rate% 43429354 ± 2% +7.7% 46789221 ± 2% perf-stat.ps.branch-misses 1506894 -3.9% 1447769 perf-stat.ps.context-switches 45.76 -0.5 45.22 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify 45.99 -0.5 45.46 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify 23.04 -0.3 22.74 perf-profile.calltrace.cycles-pp.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify.syscall_trace_enter 23.04 -0.3 22.78 perf-profile.calltrace.cycles-pp.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify.syscall_exit_to_user_mode_prepare 7.91 -0.0 7.88 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getgid 0.61 +0.0 0.64 perf-profile.calltrace.cycles-pp.__schedule.schedule.do_wait.kernel_wait4.__do_sys_wait4 0.63 +0.0 0.66 perf-profile.calltrace.cycles-pp.schedule.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 1.80 +0.1 1.85 perf-profile.calltrace.cycles-pp.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.89 +0.1 1.94 perf-profile.calltrace.cycles-pp.kernel_wait4.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4 1.92 +0.1 1.98 perf-profile.calltrace.cycles-pp.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4 2.12 +0.1 2.19 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4 2.13 +0.1 2.20 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.wait4 2.34 +0.1 2.42 perf-profile.calltrace.cycles-pp.wait4 1.24 +0.1 1.34 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_ptrace.do_syscall_64.entry_SYSCALL_64_after_hwframe.ptrace 1.27 +0.1 1.36 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ptrace 1.26 +0.1 1.36 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ptrace 1.34 +0.1 1.44 ± 2% perf-profile.calltrace.cycles-pp.ptrace 22.52 +0.3 22.84 perf-profile.calltrace.cycles-pp.cgroup_leave_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify.syscall_trace_enter 44.96 +0.4 45.33 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.cgroup_leave_frozen.ptrace_stop.ptrace_do_notify 45.31 +0.4 45.72 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.cgroup_leave_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify 46.10 -0.5 45.55 perf-profile.children.cycles-pp.cgroup_enter_frozen 90.76 -0.2 90.57 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 0.25 +0.0 0.27 perf-profile.children.cycles-pp.switch_mm_irqs_off 0.16 ± 2% +0.0 0.17 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.60 +0.0 0.63 ± 2% perf-profile.children.cycles-pp.ttwu_do_activate 1.80 +0.1 1.85 perf-profile.children.cycles-pp.do_wait 1.89 +0.1 1.95 perf-profile.children.cycles-pp.kernel_wait4 1.92 +0.1 1.98 perf-profile.children.cycles-pp.__do_sys_wait4 2.36 +0.1 2.44 perf-profile.children.cycles-pp.wait4 1.47 +0.1 1.56 perf-profile.children.cycles-pp.__schedule 1.50 +0.1 1.58 perf-profile.children.cycles-pp.schedule 1.24 +0.1 1.34 ± 3% perf-profile.children.cycles-pp.__x64_sys_ptrace 1.36 +0.1 1.46 ± 2% perf-profile.children.cycles-pp.ptrace 45.42 +0.4 45.82 perf-profile.children.cycles-pp.cgroup_leave_frozen 90.76 -0.2 90.57 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.38 -0.0 0.37 perf-profile.self.cycles-pp.ptrace_stop 0.93 +0.1 1.02 perf-profile.self.cycles-pp._raw_spin_lock_irq *************************************************************************************************** lkp-icl-2sp8: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory ========================================================================================= compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/1HDD/btrfs/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/getdent/stress-ng/60s commit: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition") 6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 ---------------- --------------------------- %stddev %change %stddev \ | \ 317900 ± 4% +51.9% 482822 ± 4% cpuidle..usage 3.05 +3.6% 3.15 iostat.cpu.user 2.49 ± 4% -0.1 2.38 ± 4% mpstat.cpu.all.idle% 15342 ± 3% +56.1% 23954 ± 3% vmstat.system.cs 178479 +2.4% 182761 vmstat.system.in 9197 ± 2% +47.8% 13598 ± 3% sched_debug.cpu.nr_switches.avg 23944 ± 6% +58.8% 38014 ± 6% sched_debug.cpu.nr_switches.max 5660 ± 12% +72.0% 9736 ± 4% sched_debug.cpu.nr_switches.stddev 58917432 ± 4% -40.0% 35337032 numa-numastat.node0.local_node 58973202 ± 4% -40.0% 35382357 numa-numastat.node0.numa_hit 37066235 +76.6% 65448120 numa-numastat.node1.local_node 37099580 +76.5% 65478720 numa-numastat.node1.numa_hit 269394 ± 4% -71.3% 77312 ± 28% numa-meminfo.node0.KReclaimable 269394 ± 4% -71.3% 77312 ± 28% numa-meminfo.node0.SReclaimable 387028 ± 2% -51.5% 187589 ± 12% numa-meminfo.node0.Slab 93129 ± 12% +194.7% 274479 ± 8% numa-meminfo.node1.KReclaimable 93129 ± 12% +194.7% 274479 ± 8% numa-meminfo.node1.SReclaimable 155845 ± 5% +120.5% 343568 ± 7% numa-meminfo.node1.Slab 67916 ± 3% -71.2% 19547 ± 28% numa-vmstat.node0.nr_slab_reclaimable 59072793 ± 4% -40.0% 35463515 numa-vmstat.node0.numa_hit 59017023 ± 4% -40.0% 35418189 numa-vmstat.node0.numa_local 23698 ± 13% +192.1% 69229 ± 9% numa-vmstat.node1.nr_slab_reclaimable 37209604 +76.6% 65720661 numa-vmstat.node1.numa_hit 37176256 +76.7% 65690060 numa-vmstat.node1.numa_local 9705 -9.2% 8816 stress-ng.getdent.nanosecs_per_getdents_call 1.17e+08 +5.8% 1.238e+08 stress-ng.getdent.ops 1949907 +5.8% 2063349 stress-ng.getdent.ops_per_sec 97203 ± 6% +12.9% 109764 stress-ng.time.involuntary_context_switches 85913623 +5.8% 90920658 stress-ng.time.minor_page_faults 82.78 ± 2% +6.7% 88.32 stress-ng.time.user_time 372113 ± 7% +74.4% 649143 ± 3% stress-ng.time.voluntary_context_switches 90376 -1.7% 88797 proc-vmstat.nr_slab_reclaimable 19745 ± 31% -26.3% 14551 ± 2% proc-vmstat.numa_hint_faults 11950 ± 41% -36.7% 7560 ± 7% proc-vmstat.numa_hint_faults_local 96087443 ± 3% +5.2% 1.011e+08 proc-vmstat.numa_hit 95998301 ± 3% +5.2% 1.01e+08 proc-vmstat.numa_local 1.012e+08 ± 3% +4.7% 1.059e+08 proc-vmstat.pgalloc_normal 86033810 +5.9% 91111926 proc-vmstat.pgfault 1.009e+08 ± 3% +4.7% 1.057e+08 proc-vmstat.pgfree 14992 ± 6% -8.3% 13744 proc-vmstat.pgreuse 3.29 -4.1% 3.15 perf-stat.i.MPKI 1.031e+10 +5.0% 1.082e+10 perf-stat.i.branch-instructions 77903770 +5.3% 82008784 perf-stat.i.branch-misses 45.24 -2.3 42.98 perf-stat.i.cache-miss-rate% 3.596e+08 ± 2% +6.5% 3.83e+08 perf-stat.i.cache-references 15896 ± 3% +56.8% 24926 ± 3% perf-stat.i.context-switches 4.51 -5.2% 4.27 perf-stat.i.cpi 339.16 ± 8% +30.7% 443.20 ± 4% perf-stat.i.cpu-migrations 4.991e+10 +5.0% 5.243e+10 perf-stat.i.instructions 0.24 +5.0% 0.25 perf-stat.i.ipc 44.19 +5.9% 46.82 perf-stat.i.metric.K/sec 1411214 +5.9% 1494386 perf-stat.i.minor-faults 1411214 +5.9% 1494386 perf-stat.i.page-faults 3.30 -3.7% 3.17 perf-stat.overall.MPKI 45.68 -2.3 43.40 perf-stat.overall.cache-miss-rate% 4.49 -4.6% 4.28 perf-stat.overall.cpi 0.22 +4.8% 0.23 perf-stat.overall.ipc 1.014e+10 +4.9% 1.063e+10 perf-stat.ps.branch-instructions 76113957 +5.3% 80174083 perf-stat.ps.branch-misses 3.541e+08 ± 2% +6.3% 3.765e+08 perf-stat.ps.cache-references 15523 ± 3% +56.4% 24284 ± 3% perf-stat.ps.context-switches 331.55 ± 9% +30.6% 433.03 ± 4% perf-stat.ps.cpu-migrations 4.907e+10 +4.9% 5.149e+10 perf-stat.ps.instructions 1388739 +5.8% 1468698 perf-stat.ps.minor-faults 1388739 +5.8% 1468698 perf-stat.ps.page-faults 3.005e+12 +4.2% 3.133e+12 perf-stat.total.instructions 59.17 -2.9 56.25 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 59.24 -2.9 56.31 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall 59.68 -2.9 56.79 perf-profile.calltrace.cycles-pp.syscall 29.18 -1.5 27.70 perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 28.64 -1.5 27.16 perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 29.82 -1.5 28.37 perf-profile.calltrace.cycles-pp.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 29.03 -1.4 27.58 perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 9.19 ± 3% -1.1 8.13 ± 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64 9.24 ± 3% -1.1 8.19 ± 2% perf-profile.calltrace.cycles-pp.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe 9.17 ± 3% -1.0 8.12 ± 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64 9.25 ± 3% -1.0 8.21 ± 2% perf-profile.calltrace.cycles-pp.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.51 ± 4% -0.6 4.89 ± 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.49 ± 3% -0.6 4.88 ± 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe 4.22 ± 4% -0.5 3.72 ± 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents64 4.20 ± 4% -0.5 3.71 ± 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents 2.78 ± 4% -0.3 2.47 ± 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.iterate_dir.__x64_sys_getdents64.do_syscall_64 2.77 ± 3% -0.3 2.47 ± 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.iterate_dir.__x64_sys_getdents.do_syscall_64 0.90 ± 4% -0.1 0.80 ± 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_lookup_de.proc_tgid_net_lookup.lookup_open.open_last_lookups 0.56 ± 2% +0.0 0.58 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault 0.62 ± 2% +0.0 0.64 perf-profile.calltrace.cycles-pp.vma_alloc_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.56 +0.0 0.59 ± 2% perf-profile.calltrace.cycles-pp.d_alloc_parallel.lookup_open.open_last_lookups.path_openat.do_filp_open 0.76 ± 3% +0.1 0.81 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.new_inode.proc_get_inode.proc_lookup_de 0.65 ± 3% +0.1 0.71 ± 5% perf-profile.calltrace.cycles-pp.apparmor_file_free_security.security_file_free.__fput.__x64_sys_close.do_syscall_64 0.66 ± 3% +0.1 0.72 ± 5% perf-profile.calltrace.cycles-pp.security_file_free.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.01 ± 2% +0.1 1.08 perf-profile.calltrace.cycles-pp._raw_spin_lock.new_inode.proc_get_inode.proc_lookup_de.proc_tgid_net_lookup 1.35 +0.1 1.43 perf-profile.calltrace.cycles-pp.new_inode.proc_get_inode.proc_lookup_de.proc_tgid_net_lookup.lookup_open 1.40 +0.1 1.49 perf-profile.calltrace.cycles-pp.proc_get_inode.proc_lookup_de.proc_tgid_net_lookup.lookup_open.open_last_lookups 0.73 +0.1 0.82 perf-profile.calltrace.cycles-pp.may_open.do_open.path_openat.do_filp_open.do_sys_openat2 0.67 +0.1 0.75 perf-profile.calltrace.cycles-pp.inode_permission.may_open.do_open.path_openat.do_filp_open 1.91 ± 3% +0.1 2.03 ± 2% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close 0.81 ± 3% +0.1 0.94 ± 4% perf-profile.calltrace.cycles-pp.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file.path_openat 1.10 ± 3% +0.1 1.23 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput 1.13 ± 3% +0.1 1.26 ± 3% perf-profile.calltrace.cycles-pp.init_file.alloc_empty_file.path_openat.do_filp_open.do_sys_openat2 0.85 ± 3% +0.1 0.98 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput 0.89 ± 3% +0.1 1.04 ± 3% perf-profile.calltrace.cycles-pp.security_file_alloc.init_file.alloc_empty_file.path_openat.do_filp_open 1.47 ± 3% +0.1 1.61 ± 3% perf-profile.calltrace.cycles-pp.alloc_empty_file.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.66 +0.2 0.86 ± 22% perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.link_path_walk.path_openat.do_filp_open 0.67 +0.2 0.87 ± 22% perf-profile.calltrace.cycles-pp.try_to_unlazy.link_path_walk.path_openat.do_filp_open.do_sys_openat2 0.88 ± 8% +0.2 1.10 ± 5% perf-profile.calltrace.cycles-pp.up_read.kernfs_dop_revalidate.lookup_fast.walk_component.link_path_walk 1.49 +0.2 1.73 ± 7% perf-profile.calltrace.cycles-pp.lookup_fast.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 1.30 ± 5% +0.3 1.56 ± 5% perf-profile.calltrace.cycles-pp.apparmor_file_open.security_file_open.do_dentry_open.do_open.path_openat 1.31 ± 4% +0.3 1.57 ± 5% perf-profile.calltrace.cycles-pp.security_file_open.do_dentry_open.do_open.path_openat.do_filp_open 2.39 ± 3% +0.3 2.65 ± 4% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 1.49 ± 2% +0.3 1.76 ± 5% perf-profile.calltrace.cycles-pp.up_read.kernfs_iop_permission.inode_permission.link_path_walk.path_openat 1.53 ± 5% +0.3 1.81 ± 2% perf-profile.calltrace.cycles-pp.down_read.kernfs_iop_permission.inode_permission.link_path_walk.path_openat 1.09 ± 10% +0.3 1.40 ± 2% perf-profile.calltrace.cycles-pp.up_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64 1.08 ± 11% +0.3 1.40 ± 2% perf-profile.calltrace.cycles-pp.up_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64 1.22 ± 9% +0.3 1.56 ± 4% perf-profile.calltrace.cycles-pp.down_read.kernfs_dop_revalidate.lookup_fast.walk_component.link_path_walk 1.09 ± 2% +0.3 1.44 ± 24% perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_openat2 1.13 ± 2% +0.4 1.48 ± 23% perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 1.43 ± 9% +0.4 1.81 perf-profile.calltrace.cycles-pp.down_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64 1.41 ± 9% +0.4 1.81 ± 2% perf-profile.calltrace.cycles-pp.down_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64 0.17 ±141% +0.4 0.58 ± 2% perf-profile.calltrace.cycles-pp.kernfs_dop_revalidate.lookup_fast.open_last_lookups.path_openat.do_filp_open 3.51 ± 2% +0.4 3.93 ± 3% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.17 ±141% +0.5 0.70 ± 28% perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.link_path_walk.path_openat 0.00 +0.6 0.56 perf-profile.calltrace.cycles-pp.kernfs_iop_permission.inode_permission.may_open.do_open.path_openat 2.14 ± 8% +0.6 2.71 ± 4% perf-profile.calltrace.cycles-pp.kernfs_dop_revalidate.lookup_fast.walk_component.link_path_walk.path_openat 3.14 ± 3% +0.6 3.71 ± 4% perf-profile.calltrace.cycles-pp.kernfs_iop_permission.inode_permission.link_path_walk.path_openat.do_filp_open 4.18 ± 2% +0.6 4.77 ± 3% perf-profile.calltrace.cycles-pp.inode_permission.link_path_walk.path_openat.do_filp_open.do_sys_openat2 4.89 ± 4% +0.6 5.50 ± 2% perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_openat2 3.29 ± 5% +0.6 3.93 ± 3% perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open 3.23 ± 7% +0.7 3.96 perf-profile.calltrace.cycles-pp.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.22 ± 8% +0.7 3.96 ± 2% perf-profile.calltrace.cycles-pp.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe 10.92 ± 2% +1.4 12.34 ± 2% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 24.85 +2.5 27.32 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64 24.92 +2.5 27.39 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 26.02 +2.5 28.52 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 26.05 +2.5 28.55 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 26.11 +2.5 28.61 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 26.13 +2.5 28.63 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64 26.32 +2.5 28.83 perf-profile.calltrace.cycles-pp.open64 29.41 ± 3% -3.3 26.07 ± 2% perf-profile.children.cycles-pp.proc_readdir_de 57.69 -2.9 54.77 perf-profile.children.cycles-pp.iterate_dir 59.85 -2.9 56.97 perf-profile.children.cycles-pp.syscall 18.49 ± 3% -2.1 16.39 ± 2% perf-profile.children.cycles-pp.proc_tgid_net_readdir 15.47 ± 4% -1.8 13.70 ± 2% perf-profile.children.cycles-pp._raw_read_lock 29.19 -1.5 27.70 perf-profile.children.cycles-pp.__x64_sys_getdents64 29.83 -1.4 28.38 perf-profile.children.cycles-pp.__x64_sys_getdents 94.11 -0.3 93.85 perf-profile.children.cycles-pp.do_syscall_64 94.19 -0.3 93.94 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.89 ± 2% -0.0 0.86 ± 3% perf-profile.children.cycles-pp.proc_readfd_common 0.08 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.main 0.08 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.run_builtin 0.07 ± 11% -0.0 0.05 perf-profile.children.cycles-pp.__cmd_record 0.07 ± 11% -0.0 0.05 perf-profile.children.cycles-pp.cmd_record 0.12 ± 4% -0.0 0.10 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.10 +0.0 0.11 perf-profile.children.cycles-pp.atime_needs_update 0.05 +0.0 0.06 perf-profile.children.cycles-pp.nd_jump_root 0.09 +0.0 0.10 perf-profile.children.cycles-pp.__init_rwsem 0.17 +0.0 0.18 perf-profile.children.cycles-pp.generic_permission 0.06 +0.0 0.07 perf-profile.children.cycles-pp.proc_pid_readdir 0.06 +0.0 0.07 perf-profile.children.cycles-pp.process_measurement 0.12 +0.0 0.13 perf-profile.children.cycles-pp.uncharge_batch 0.18 +0.0 0.19 perf-profile.children.cycles-pp.vsnprintf 0.22 ± 2% +0.0 0.24 perf-profile.children.cycles-pp.native_irq_return_iret 0.19 ± 2% +0.0 0.21 ± 2% perf-profile.children.cycles-pp.stress_getdents_dir 0.17 +0.0 0.18 ± 2% perf-profile.children.cycles-pp.memchr 0.08 +0.0 0.09 ± 5% perf-profile.children.cycles-pp.path_init 0.08 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.locks_remove_posix 0.24 +0.0 0.25 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook 0.38 +0.0 0.40 perf-profile.children.cycles-pp.getname_flags 0.10 +0.0 0.12 ± 4% perf-profile.children.cycles-pp.page_counter_uncharge 0.14 ± 3% +0.0 0.16 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 0.10 +0.0 0.12 perf-profile.children.cycles-pp.percpu_counter_add_batch 0.19 ± 2% +0.0 0.21 ± 2% perf-profile.children.cycles-pp.inode_init_always 0.20 ± 2% +0.0 0.22 ± 2% perf-profile.children.cycles-pp.mod_objcg_state 0.33 +0.0 0.35 perf-profile.children.cycles-pp.__cond_resched 0.18 ± 4% +0.0 0.20 ± 2% perf-profile.children.cycles-pp.strlcat 0.56 +0.0 0.58 perf-profile.children.cycles-pp.alloc_inode 0.52 +0.0 0.55 ± 2% perf-profile.children.cycles-pp.d_alloc 0.66 +0.0 0.69 perf-profile.children.cycles-pp.__slab_free 0.12 +0.0 0.15 ± 18% perf-profile.children.cycles-pp.try_to_unlazy_next 0.31 +0.0 0.34 perf-profile.children.cycles-pp.__memcg_slab_free_hook 0.70 +0.0 0.74 perf-profile.children.cycles-pp.d_alloc_parallel 0.77 +0.0 0.81 perf-profile.children.cycles-pp.filldir64 0.77 +0.0 0.82 ± 2% perf-profile.children.cycles-pp.filldir 0.11 ± 4% +0.1 0.17 ± 7% perf-profile.children.cycles-pp.security_current_getsecid_subj 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.cpu_startup_entry 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.do_idle 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.start_secondary 0.10 ± 4% +0.1 0.16 ± 7% perf-profile.children.cycles-pp.apparmor_current_getsecid_subj 1.04 +0.1 1.09 perf-profile.children.cycles-pp.rcu_do_batch 1.04 +0.1 1.10 perf-profile.children.cycles-pp.rcu_core 1.05 +0.1 1.11 perf-profile.children.cycles-pp.__do_softirq 0.65 ± 4% +0.1 0.71 ± 4% perf-profile.children.cycles-pp.apparmor_file_free_security 0.66 ± 3% +0.1 0.72 ± 4% perf-profile.children.cycles-pp.security_file_free 0.19 ± 4% +0.1 0.25 ± 6% perf-profile.children.cycles-pp.ima_file_check 1.19 +0.1 1.27 perf-profile.children.cycles-pp.kmem_cache_free 0.43 ± 13% +0.1 0.51 ± 5% perf-profile.children.cycles-pp.smpboot_thread_fn 0.44 ± 13% +0.1 0.52 ± 5% perf-profile.children.cycles-pp.kthread 0.44 ± 13% +0.1 0.52 ± 5% perf-profile.children.cycles-pp.ret_from_fork 0.44 ± 13% +0.1 0.52 ± 5% perf-profile.children.cycles-pp.ret_from_fork_asm 0.42 ± 15% +0.1 0.51 ± 5% perf-profile.children.cycles-pp.run_ksoftirqd 0.74 +0.1 0.83 perf-profile.children.cycles-pp.may_open 1.92 ± 3% +0.1 2.04 ± 2% perf-profile.children.cycles-pp.evict 1.14 ± 3% +0.1 1.27 ± 3% perf-profile.children.cycles-pp.init_file 0.81 ± 3% +0.1 0.95 ± 4% perf-profile.children.cycles-pp.apparmor_file_alloc_security 0.90 ± 2% +0.1 1.04 ± 3% perf-profile.children.cycles-pp.security_file_alloc 1.47 ± 3% +0.1 1.61 ± 3% perf-profile.children.cycles-pp.alloc_empty_file 2.19 ± 2% +0.2 2.34 perf-profile.children.cycles-pp.new_inode 2.26 ± 2% +0.2 2.42 perf-profile.children.cycles-pp.proc_get_inode 0.53 ± 4% +0.2 0.70 ± 7% perf-profile.children.cycles-pp.apparmor_file_permission 0.55 ± 5% +0.2 0.72 ± 6% perf-profile.children.cycles-pp.security_file_permission 1.30 ± 5% +0.3 1.56 ± 5% perf-profile.children.cycles-pp.apparmor_file_open 1.32 ± 4% +0.3 1.57 ± 5% perf-profile.children.cycles-pp.security_file_open 2.40 ± 4% +0.3 2.66 ± 4% perf-profile.children.cycles-pp.do_dentry_open 1.35 +0.3 1.69 ± 19% perf-profile.children.cycles-pp.try_to_unlazy 1.14 ± 2% +0.4 1.50 ± 23% perf-profile.children.cycles-pp.terminate_walk 1.40 +0.4 1.77 ± 20% perf-profile.children.cycles-pp.__legitimize_path 1.00 ± 2% +0.4 1.39 ± 26% perf-profile.children.cycles-pp.lockref_get_not_dead 7.02 ± 3% +0.4 7.42 ± 3% perf-profile.children.cycles-pp.dput 3.52 ± 2% +0.4 3.94 ± 3% perf-profile.children.cycles-pp.do_open 4.91 ± 4% +0.6 5.53 ± 2% perf-profile.children.cycles-pp.walk_component 3.62 ± 3% +0.7 4.29 ± 3% perf-profile.children.cycles-pp.kernfs_iop_permission 4.87 ± 2% +0.7 5.54 ± 2% perf-profile.children.cycles-pp.inode_permission 2.61 ± 8% +0.7 3.30 ± 4% perf-profile.children.cycles-pp.kernfs_dop_revalidate 4.80 ± 4% +0.9 5.69 ± 3% perf-profile.children.cycles-pp.lookup_fast 5.71 ± 6% +1.2 6.89 ± 3% perf-profile.children.cycles-pp.up_read 10.94 ± 2% +1.4 12.38 ± 2% perf-profile.children.cycles-pp.link_path_walk 6.48 ± 8% +1.5 7.95 ± 2% perf-profile.children.cycles-pp.kernfs_fop_readdir 6.24 ± 7% +1.5 7.75 ± 2% perf-profile.children.cycles-pp.down_read 24.88 +2.5 27.36 perf-profile.children.cycles-pp.path_openat 24.94 +2.5 27.42 perf-profile.children.cycles-pp.do_filp_open 26.06 +2.5 28.56 perf-profile.children.cycles-pp.do_sys_openat2 26.07 +2.5 28.58 perf-profile.children.cycles-pp.__x64_sys_openat 26.37 +2.5 28.88 perf-profile.children.cycles-pp.open64 15.34 ± 4% -1.8 13.59 ± 2% perf-profile.self.cycles-pp._raw_read_lock 13.66 ± 4% -1.7 11.95 ± 2% perf-profile.self.cycles-pp.proc_readdir_de 1.61 ± 4% -0.2 1.46 ± 2% perf-profile.self.cycles-pp.proc_lookup_de 0.12 ± 4% -0.0 0.10 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.10 +0.0 0.11 perf-profile.self.cycles-pp.page_counter_uncharge 0.10 +0.0 0.11 perf-profile.self.cycles-pp.percpu_counter_add_batch 0.05 +0.0 0.06 perf-profile.self.cycles-pp.refill_obj_stock 0.08 +0.0 0.09 perf-profile.self.cycles-pp.number 0.09 +0.0 0.10 perf-profile.self.cycles-pp.pid_revalidate 0.19 ± 2% +0.0 0.21 ± 2% perf-profile.self.cycles-pp.__cond_resched 0.26 +0.0 0.28 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.22 ± 2% +0.0 0.24 perf-profile.self.cycles-pp.native_irq_return_iret 0.12 +0.0 0.13 ± 3% perf-profile.self.cycles-pp.__memcg_slab_free_hook 0.13 +0.0 0.14 ± 3% perf-profile.self.cycles-pp.generic_permission 0.08 +0.0 0.09 ± 5% perf-profile.self.cycles-pp.locks_remove_posix 0.09 ± 5% +0.0 0.10 perf-profile.self.cycles-pp.__call_rcu_common 0.18 ± 2% +0.0 0.19 perf-profile.self.cycles-pp.proc_tgid_net_lookup 0.17 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.mod_objcg_state 0.13 ± 3% +0.0 0.15 ± 3% perf-profile.self.cycles-pp.inode_init_always 0.16 ± 3% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.do_syscall_64 0.17 ± 2% +0.0 0.19 ± 2% perf-profile.self.cycles-pp.get_proc_task_net 0.23 ± 2% +0.0 0.25 perf-profile.self.cycles-pp.syscall 0.65 +0.0 0.68 perf-profile.self.cycles-pp.__slab_free 0.38 ± 2% +0.0 0.41 ± 6% perf-profile.self.cycles-pp.inode_permission 0.56 +0.0 0.60 perf-profile.self.cycles-pp.filldir 0.84 ± 2% +0.0 0.89 ± 2% perf-profile.self.cycles-pp.lockref_get_not_dead 0.55 +0.0 0.60 ± 2% perf-profile.self.cycles-pp.filldir64 0.00 +0.1 0.05 perf-profile.self.cycles-pp.proc_tgid_net_readdir 0.10 ± 4% +0.1 0.16 ± 7% perf-profile.self.cycles-pp.apparmor_current_getsecid_subj 0.65 ± 3% +0.1 0.71 ± 5% perf-profile.self.cycles-pp.apparmor_file_free_security 0.80 ± 3% +0.1 0.93 ± 4% perf-profile.self.cycles-pp.apparmor_file_alloc_security 0.49 ± 5% +0.2 0.66 ± 6% perf-profile.self.cycles-pp.apparmor_file_permission 1.29 ± 4% +0.3 1.54 ± 5% perf-profile.self.cycles-pp.apparmor_file_open 5.66 ± 6% +1.2 6.84 ± 3% perf-profile.self.cycles-pp.up_read 6.15 ± 7% +1.5 7.66 ± 2% perf-profile.self.cycles-pp.down_read *************************************************************************************************** lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/futex4/will-it-scale commit: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition") 6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 ---------------- --------------------------- %stddev %change %stddev \ | \ 33.61 +1.2% 34.01 boot-time.boot 3130 +1.3% 3171 boot-time.idle 70814196 +4.0% 73672365 will-it-scale.104.threads 680905 +4.0% 708387 will-it-scale.per_thread_ops 70814196 +4.0% 73672365 will-it-scale.workload 89530 -1.7% 88005 proc-vmstat.nr_active_anon 92711 -1.7% 91127 proc-vmstat.nr_shmem 89530 -1.7% 88005 proc-vmstat.nr_zone_active_anon 76969 -1.7% 75654 proc-vmstat.pgactivate 1086126 -1.8% 1066713 proc-vmstat.pgalloc_normal 40426 ± 3% +10.6% 44714 ± 4% proc-vmstat.pgreuse 10727 ± 61% +52.8% 16392 ± 5% sched_debug.cfs_rq:/.load_avg.max 0.07 ± 12% -18.3% 0.06 ± 3% sched_debug.cfs_rq:/.nr_running.stddev 0.92 ± 74% +383.2% 4.45 ± 30% sched_debug.cfs_rq:/.removed.runnable_avg.avg 6.89 ± 72% +161.3% 18.00 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 0.92 ± 74% +383.1% 4.45 ± 30% sched_debug.cfs_rq:/.removed.util_avg.avg 6.89 ± 72% +161.2% 17.99 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev 1259 ± 2% -17.5% 1039 ± 16% sched_debug.cfs_rq:/.util_est.max 3796 ± 3% +29.1% 4902 ± 12% sched_debug.cpu.max_idle_balance_cost.stddev 0.00 +11.9% 0.00 ± 2% sched_debug.cpu.next_balance.stddev 876.78 ± 7% +10.9% 972.39 ± 7% sched_debug.cpu.nr_switches.min 6.71 ± 8% -16.5% 5.60 ± 4% sched_debug.cpu.nr_uninterruptible.stddev 6.114e+09 +4.3% 6.376e+09 perf-stat.i.branch-instructions 1.35 +0.2 1.53 perf-stat.i.branch-miss-rate% 81670984 +19.2% 97330429 perf-stat.i.branch-misses 6.05 -3.3% 5.85 perf-stat.i.cpi 4.754e+10 +4.0% 4.944e+10 perf-stat.i.instructions 0.17 +2.9% 0.17 perf-stat.i.ipc 1.34 +0.2 1.53 perf-stat.overall.branch-miss-rate% 6.07 -3.5% 5.86 perf-stat.overall.cpi 0.16 +3.6% 0.17 perf-stat.overall.ipc 6.094e+09 +4.3% 6.354e+09 perf-stat.ps.branch-instructions 81368878 +19.2% 96977141 perf-stat.ps.branch-misses 4.738e+10 +4.0% 4.928e+10 perf-stat.ps.instructions 0.03 ± 47% +57.9% 0.05 ± 9% perf-stat.ps.major-faults 1.439e+13 +3.6% 1.491e+13 perf-stat.total.instructions 44.04 -21.1 22.93 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 55.72 -19.2 36.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall 19.53 -18.4 1.17 ± 2% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 20.23 -6.0 14.24 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 22.07 -5.9 16.15 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 20.74 -5.9 14.83 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 6.35 -4.0 2.31 ± 2% perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait 6.88 -3.8 3.10 perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex 29.51 -2.6 26.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall 12.81 -2.6 10.22 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex 13.99 -2.4 11.54 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64 0.64 +0.1 0.69 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex 0.76 +0.1 0.82 perf-profile.calltrace.cycles-pp.futex_q_unlock.futex_wait_setup.__futex_wait.futex_wait.do_futex 99.58 +0.1 99.65 perf-profile.calltrace.cycles-pp.syscall 0.97 +0.2 1.19 perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait 8.56 +0.6 9.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall 1.21 +0.8 2.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait 3.64 +0.9 4.58 ± 2% perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex 7.91 +2.7 10.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall 0.00 +17.5 17.48 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall 44.20 -21.7 22.46 perf-profile.children.cycles-pp.do_syscall_64 56.13 -18.9 37.23 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 19.74 -18.5 1.26 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 20.38 -6.0 14.36 perf-profile.children.cycles-pp.futex_wait 22.12 -5.9 16.21 perf-profile.children.cycles-pp.__x64_sys_futex 20.80 -5.9 14.90 perf-profile.children.cycles-pp.do_futex 6.53 -3.9 2.68 ± 2% perf-profile.children.cycles-pp.__get_user_nocheck_4 7.08 -3.8 3.29 ± 2% perf-profile.children.cycles-pp.futex_get_value_locked 29.66 -2.6 27.09 perf-profile.children.cycles-pp.syscall_return_via_sysret 13.00 -2.5 10.47 perf-profile.children.cycles-pp.futex_wait_setup 14.01 -2.4 11.58 perf-profile.children.cycles-pp.__futex_wait 0.18 ± 2% -0.1 0.13 ± 6% perf-profile.children.cycles-pp.amd_clear_divider 0.18 ± 2% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.futex_setup_timer 0.44 -0.0 0.41 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.05 +0.0 0.06 perf-profile.children.cycles-pp.syscall@plt 0.13 +0.0 0.16 ± 3% perf-profile.children.cycles-pp.testcase 0.80 +0.1 0.86 perf-profile.children.cycles-pp.futex_q_unlock 0.67 +0.1 0.72 perf-profile.children.cycles-pp.get_futex_key 0.98 +0.2 1.21 perf-profile.children.cycles-pp.futex_hash 1.26 +0.9 2.13 perf-profile.children.cycles-pp._raw_spin_lock 3.76 +1.0 4.75 ± 2% perf-profile.children.cycles-pp.futex_q_lock 4.25 +1.4 5.62 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 11.05 +1.9 12.94 perf-profile.children.cycles-pp.entry_SYSCALL_64 1.26 +17.4 18.64 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 19.27 -18.5 0.77 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 6.32 -3.8 2.51 perf-profile.self.cycles-pp.__get_user_nocheck_4 6.23 -3.5 2.68 perf-profile.self.cycles-pp.futex_wait 29.64 -2.6 27.05 perf-profile.self.cycles-pp.syscall_return_via_sysret 0.13 -0.0 0.10 perf-profile.self.cycles-pp.futex_setup_timer 0.39 ± 2% -0.0 0.36 ± 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 0.06 +0.0 0.09 ± 5% perf-profile.self.cycles-pp.amd_clear_divider 0.58 +0.0 0.61 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.13 ± 3% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.testcase 0.65 +0.0 0.69 perf-profile.self.cycles-pp.get_futex_key 0.78 +0.0 0.83 perf-profile.self.cycles-pp.futex_q_unlock 0.96 +0.1 1.07 perf-profile.self.cycles-pp.__futex_wait 0.44 +0.1 0.58 perf-profile.self.cycles-pp.do_futex 0.85 +0.2 1.06 perf-profile.self.cycles-pp.futex_wait_setup 0.93 +0.2 1.17 perf-profile.self.cycles-pp.futex_hash 1.23 +0.9 2.09 perf-profile.self.cycles-pp._raw_spin_lock 9.85 +1.9 11.73 perf-profile.self.cycles-pp.entry_SYSCALL_64 2.11 +2.3 4.37 perf-profile.self.cycles-pp.syscall 1.86 +2.3 4.19 perf-profile.self.cycles-pp.do_syscall_64 12.24 +3.1 15.38 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 1.11 +17.4 18.46 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack *************************************************************************************************** lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/futex2/will-it-scale commit: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition") 6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 ---------------- --------------------------- %stddev %change %stddev \ | \ 228.17 ± 8% -23.3% 175.00 ± 15% perf-c2c.HITM.local 25.31 ± 26% +1883.0% 501.86 ±132% sched_debug.cfs_rq:/.removed.load_avg.stddev 5561 ± 52% -43.3% 3154 ± 11% turbostat.C1 17507 ± 17% +19.7% 20950 ± 4% proc-vmstat.numa_hint_faults_local 61472 +4.9% 64491 ± 2% proc-vmstat.pgactivate 66711960 -2.1% 65339777 will-it-scale.104.processes 641460 -2.1% 628266 will-it-scale.per_process_ops 66711960 -2.1% 65339777 will-it-scale.workload 0.33 ± 21% -31.2% 0.23 ± 18% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 0.09 ± 16% +82.6% 0.16 ± 15% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 534.00 ± 4% -10.5% 478.00 ± 3% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 187.33 ± 7% -16.7% 156.00 ± 10% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.09 ± 16% +82.6% 0.16 ± 15% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 1.296e+10 -1.9% 1.271e+10 perf-stat.i.branch-instructions 1.00 +0.1 1.06 perf-stat.i.branch-miss-rate% 1.286e+08 +4.8% 1.348e+08 perf-stat.i.branch-misses 3.34 +2.1% 3.41 perf-stat.i.cpi 66597836 -1.9% 65315938 perf-stat.i.dTLB-load-misses 1.946e+10 -1.9% 1.909e+10 perf-stat.i.dTLB-loads 0.00 ± 59% -0.0 0.00 perf-stat.i.dTLB-store-miss-rate% 58479 -6.6% 54625 perf-stat.i.dTLB-store-misses 1.439e+10 -1.9% 1.411e+10 perf-stat.i.dTLB-stores 73446151 -8.8% 67017808 ± 4% perf-stat.i.iTLB-load-misses 8.619e+10 -2.0% 8.443e+10 perf-stat.i.instructions 1175 ± 2% +7.7% 1266 ± 4% perf-stat.i.instructions-per-iTLB-miss 0.30 -2.1% 0.29 perf-stat.i.ipc 450.05 -1.9% 441.52 perf-stat.i.metric.M/sec 192401 ± 5% -6.8% 179259 ± 6% perf-stat.i.node-load-misses 0.99 +0.1 1.06 perf-stat.overall.branch-miss-rate% 3.33 +2.2% 3.41 perf-stat.overall.cpi 0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate% 1173 ± 2% +7.6% 1262 ± 4% perf-stat.overall.instructions-per-iTLB-miss 0.30 -2.1% 0.29 perf-stat.overall.ipc 1.292e+10 -1.9% 1.267e+10 perf-stat.ps.branch-instructions 1.282e+08 +4.8% 1.344e+08 perf-stat.ps.branch-misses 66375435 -1.9% 65097246 perf-stat.ps.dTLB-load-misses 1.94e+10 -1.9% 1.903e+10 perf-stat.ps.dTLB-loads 58320 -6.6% 54460 perf-stat.ps.dTLB-store-misses 1.434e+10 -1.9% 1.407e+10 perf-stat.ps.dTLB-stores 73202477 -8.8% 66790734 ± 4% perf-stat.ps.iTLB-load-misses 8.59e+10 -2.0% 8.415e+10 perf-stat.ps.instructions 191780 ± 5% -6.8% 178656 ± 6% perf-stat.ps.node-load-misses 2.598e+13 -2.2% 2.541e+13 perf-stat.total.instructions 17.48 -16.6 0.83 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 61.84 -10.1 51.77 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall 48.64 -5.8 42.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 13.22 -5.6 7.61 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall 21.25 -1.3 19.98 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall 7.92 -0.2 7.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall 99.68 +0.1 99.74 perf-profile.calltrace.cycles-pp.syscall 1.04 +0.1 1.13 perf-profile.calltrace.cycles-pp.try_grab_folio.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast 0.61 +0.3 0.96 perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait 0.89 +0.4 1.25 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait 0.00 +0.9 0.87 perf-profile.calltrace.cycles-pp.__pte_offset_map.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast 2.36 ± 5% +1.6 3.97 perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex 3.56 +1.9 5.43 perf-profile.calltrace.cycles-pp.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast 2.16 ± 2% +2.3 4.47 perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait 6.48 +3.3 9.79 perf-profile.calltrace.cycles-pp.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key 2.49 ± 2% +3.5 6.00 perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex 7.50 +3.9 11.39 perf-profile.calltrace.cycles-pp.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wait_setup 8.30 +4.7 12.95 perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wait_setup.__futex_wait 9.10 +5.4 14.50 perf-profile.calltrace.cycles-pp.get_user_pages_fast.get_futex_key.futex_wait_setup.__futex_wait.futex_wait 10.86 +6.8 17.64 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex 24.84 +7.9 32.70 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 23.47 +8.1 31.60 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 22.92 +8.1 31.06 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 20.40 +9.2 29.60 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64 17.06 +11.5 28.57 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex 0.00 +13.6 13.64 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall 17.67 -16.8 0.89 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 62.15 -9.3 52.89 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 48.78 -6.9 41.92 perf-profile.children.cycles-pp.do_syscall_64 13.14 -3.0 10.14 perf-profile.children.cycles-pp.entry_SYSCALL_64 6.90 -2.8 4.09 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 21.41 -1.3 20.13 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.78 ± 3% -0.5 0.29 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.17 ± 2% -0.1 0.06 ± 6% perf-profile.children.cycles-pp.amd_clear_divider 0.34 ± 2% +0.0 0.39 ± 2% perf-profile.children.cycles-pp.is_valid_gup_args 0.06 ± 9% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.pud_huge 1.05 +0.1 1.13 perf-profile.children.cycles-pp.try_grab_folio 0.07 ± 5% +0.1 0.16 ± 3% perf-profile.children.cycles-pp.pmd_huge 0.94 +0.4 1.30 perf-profile.children.cycles-pp._raw_spin_lock 0.61 +0.4 0.98 perf-profile.children.cycles-pp.futex_hash 0.45 +0.4 0.88 perf-profile.children.cycles-pp.__pte_offset_map 2.48 ± 5% +1.6 4.06 perf-profile.children.cycles-pp.futex_q_lock 3.64 +1.9 5.52 perf-profile.children.cycles-pp.gup_pte_range 2.28 ± 2% +2.8 5.08 perf-profile.children.cycles-pp.__get_user_nocheck_4 2.54 ± 2% +2.9 5.49 perf-profile.children.cycles-pp.futex_get_value_locked 6.56 +3.3 9.90 perf-profile.children.cycles-pp.gup_pgd_range 7.54 +3.9 11.44 perf-profile.children.cycles-pp.lockless_pages_from_mm 8.42 +4.7 13.12 perf-profile.children.cycles-pp.internal_get_user_pages_fast 9.20 +5.5 14.70 perf-profile.children.cycles-pp.get_user_pages_fast 10.90 +6.8 17.70 perf-profile.children.cycles-pp.get_futex_key 24.90 +7.9 32.76 perf-profile.children.cycles-pp.__x64_sys_futex 23.54 +8.1 31.67 perf-profile.children.cycles-pp.do_futex 23.00 +8.2 31.16 perf-profile.children.cycles-pp.futex_wait 20.42 +9.2 29.65 perf-profile.children.cycles-pp.__futex_wait 17.15 +11.5 28.70 perf-profile.children.cycles-pp.futex_wait_setup 1.24 +13.4 14.65 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 16.85 -16.3 0.54 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 12.04 -3.0 9.08 perf-profile.self.cycles-pp.entry_SYSCALL_64 3.22 -2.3 0.95 perf-profile.self.cycles-pp.__futex_wait 13.61 -1.6 12.00 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 21.39 -1.3 20.10 perf-profile.self.cycles-pp.syscall_return_via_sysret 2.48 -1.0 1.43 perf-profile.self.cycles-pp.futex_wait 0.74 ± 3% -0.5 0.26 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 1.36 ± 3% -0.3 1.07 perf-profile.self.cycles-pp.__x64_sys_futex 0.09 ± 4% -0.0 0.08 perf-profile.self.cycles-pp.futex_setup_timer 0.05 ± 8% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.pud_huge 0.31 ± 2% +0.0 0.35 ± 2% perf-profile.self.cycles-pp.is_valid_gup_args 0.00 +0.1 0.05 perf-profile.self.cycles-pp.syscall@plt 1.03 +0.1 1.09 perf-profile.self.cycles-pp.try_grab_folio 0.05 ± 7% +0.1 0.13 ± 2% perf-profile.self.cycles-pp.pmd_huge 2.36 +0.1 2.46 perf-profile.self.cycles-pp.syscall 0.27 ± 5% +0.2 0.42 ± 5% perf-profile.self.cycles-pp.futex_get_value_locked 0.61 ± 2% +0.2 0.84 perf-profile.self.cycles-pp.futex_wait_setup 0.90 +0.4 1.26 perf-profile.self.cycles-pp._raw_spin_lock 0.59 +0.4 0.95 perf-profile.self.cycles-pp.futex_hash 0.44 +0.4 0.87 perf-profile.self.cycles-pp.__pte_offset_map 0.90 +0.5 1.41 perf-profile.self.cycles-pp.lockless_pages_from_mm 0.51 ± 2% +0.7 1.24 perf-profile.self.cycles-pp.get_user_pages_fast 0.90 +0.8 1.73 perf-profile.self.cycles-pp.internal_get_user_pages_fast 0.96 ± 12% +0.9 1.81 ± 2% perf-profile.self.cycles-pp.futex_q_lock 1.69 +1.3 2.98 perf-profile.self.cycles-pp.get_futex_key 5.82 +1.3 7.13 perf-profile.self.cycles-pp.do_syscall_64 2.81 +1.4 4.16 perf-profile.self.cycles-pp.gup_pgd_range 2.06 +1.4 3.46 perf-profile.self.cycles-pp.gup_pte_range 2.24 ± 2% +2.8 5.01 perf-profile.self.cycles-pp.__get_user_nocheck_4 1.08 +13.4 14.51 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack *************************************************************************************************** lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/futex3/will-it-scale commit: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition") 6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key") a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 ---------------- --------------------------- %stddev %change %stddev \ | \ 13.83 ± 10% +25.7% 17.39 ± 9% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 13.66 ± 11% +25.8% 17.19 ± 9% sched_debug.cfs_rq:/.removed.util_avg.stddev 0.44 ± 8% -20.4% 0.35 ± 17% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 283.50 ± 8% +14.2% 323.67 ± 4% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.44 ± 8% -20.4% 0.35 ± 17% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 76308359 +3.7% 79094461 will-it-scale.104.processes 733733 +3.7% 760523 will-it-scale.per_process_ops 76308359 +3.7% 79094461 will-it-scale.workload 55603 -2.2% 54361 proc-vmstat.nr_active_anon 58137 -2.3% 56792 proc-vmstat.nr_shmem 55603 -2.2% 54361 proc-vmstat.nr_zone_active_anon 57819 ± 2% -3.5% 55794 proc-vmstat.pgactivate 4.625e+09 +3.7% 4.793e+09 perf-stat.i.branch-instructions 1.76 +0.3 2.10 perf-stat.i.branch-miss-rate% 81504213 +23.8% 1.009e+08 perf-stat.i.branch-misses 7.84 -3.1% 7.59 perf-stat.i.cpi 76204495 +3.7% 79030797 perf-stat.i.dTLB-load-misses 8.857e+09 +3.7% 9.18e+09 perf-stat.i.dTLB-loads 0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate% 74968 +2.1% 76523 perf-stat.i.dTLB-store-misses 6.71e+09 +3.6% 6.954e+09 perf-stat.i.dTLB-stores 3.674e+10 +3.3% 3.794e+10 perf-stat.i.instructions 0.13 +3.2% 0.13 perf-stat.i.ipc 194.14 +3.6% 201.22 perf-stat.i.metric.M/sec 76.87 +1.3 78.12 perf-stat.i.node-store-miss-rate% 1.76 +0.3 2.10 perf-stat.overall.branch-miss-rate% 7.84 -3.1% 7.59 perf-stat.overall.cpi 0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate% 0.13 +3.2% 0.13 perf-stat.overall.ipc 4.609e+09 +3.6% 4.778e+09 perf-stat.ps.branch-instructions 81226256 +23.8% 1.005e+08 perf-stat.ps.branch-misses 75948753 +3.7% 78766248 perf-stat.ps.dTLB-load-misses 8.827e+09 +3.7% 9.15e+09 perf-stat.ps.dTLB-loads 74738 +2.1% 76323 perf-stat.ps.dTLB-store-misses 6.688e+09 +3.6% 6.931e+09 perf-stat.ps.dTLB-stores 3.662e+10 +3.3% 3.781e+10 perf-stat.ps.instructions 1.106e+13 +3.2% 1.141e+13 perf-stat.total.instructions 39.96 -26.1 13.91 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 52.30 -23.3 29.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall 21.46 -20.2 1.24 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 15.92 -9.7 6.22 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 14.55 -9.7 4.85 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 13.86 -9.7 4.18 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.41 -4.2 1.17 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wake.do_futex.__x64_sys_futex.do_syscall_64 5.45 -4.1 1.31 perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64 32.42 -3.9 28.48 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall 99.16 -2.6 96.55 perf-profile.calltrace.cycles-pp.syscall 8.66 -1.7 6.99 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall 8.99 +0.3 9.32 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall 0.58 +3.4 3.94 ± 7% perf-profile.calltrace.cycles-pp.testcase 0.00 +21.1 21.12 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall 40.10 -26.8 13.28 perf-profile.children.cycles-pp.do_syscall_64 52.74 -22.8 29.91 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 21.67 -20.4 1.32 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 14.14 -9.8 4.32 perf-profile.children.cycles-pp.futex_wake 14.61 -9.7 4.90 perf-profile.children.cycles-pp.do_futex 15.97 -9.7 6.27 perf-profile.children.cycles-pp.__x64_sys_futex 5.45 -4.3 1.20 perf-profile.children.cycles-pp.get_futex_key 5.46 -4.1 1.32 perf-profile.children.cycles-pp.futex_hash 32.59 -3.9 28.68 perf-profile.children.cycles-pp.syscall_return_via_sysret 99.58 -2.3 97.28 perf-profile.children.cycles-pp.syscall 4.64 -0.7 3.96 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 11.74 -0.7 11.08 perf-profile.children.cycles-pp.entry_SYSCALL_64 0.45 ± 5% -0.1 0.36 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.18 ± 3% -0.1 0.09 ± 4% perf-profile.children.cycles-pp.amd_clear_divider 0.05 +0.0 0.10 ± 3% perf-profile.children.cycles-pp.syscall@plt 0.58 +2.7 3.30 ± 8% perf-profile.children.cycles-pp.testcase 1.35 +21.0 22.37 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 21.18 -20.3 0.88 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 5.37 -4.2 1.16 perf-profile.self.cycles-pp.get_futex_key 5.22 -4.0 1.23 perf-profile.self.cycles-pp.futex_hash 32.55 -4.0 28.57 perf-profile.self.cycles-pp.syscall_return_via_sysret 3.42 -1.5 1.87 perf-profile.self.cycles-pp.futex_wake 10.45 -0.7 9.80 perf-profile.self.cycles-pp.entry_SYSCALL_64 0.40 ± 6% -0.1 0.32 ± 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 0.05 +0.0 0.07 ± 5% perf-profile.self.cycles-pp.amd_clear_divider 0.05 +0.0 0.10 ± 4% perf-profile.self.cycles-pp.syscall@plt 0.51 +0.1 0.62 perf-profile.self.cycles-pp.do_futex 0.61 +0.3 0.91 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.34 ± 2% +2.3 2.63 ± 9% perf-profile.self.cycles-pp.testcase 1.98 +2.8 4.80 perf-profile.self.cycles-pp.do_syscall_64 1.65 +3.9 5.51 ± 3% perf-profile.self.cycles-pp.syscall 13.00 +4.4 17.39 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 1.19 +21.0 22.18 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki