Hello, kernel test robot noticed a 10.7% improvement of stress-ng.netlink-task.ops_per_sec on: commit: d93300891f810c9498d09a6ecea2403d7a3546f0 ("[PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.") url: https://github.com/intel-lab-lkp/linux/commits/David-Laight/locking-osq_lock-Defer-clearing-node-locked-until-the-slow-osq_lock-path/20240101-055853 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 610a9b8f49fbcf1100716370d3b5f6f884a2835a patch link: https://lore.kernel.org/all/3a9d1782cd50436c99ced8c10175bae6@xxxxxxxxxxxxxxxx/ patch subject: [PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check. testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s sc_pid_max: 4194304 class: scheduler test: netlink-task cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240108/202401081557.641738f5-oliver.sang@xxxxxxxxx ========================================================================================= class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/sc_pid_max/tbox_group/test/testcase/testtime: scheduler/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/4194304/lkp-icl-2sp8/netlink-task/stress-ng/60s commit: ff787c1fd0 ("locking/osq_lock: Defer clearing node->locked until the slow osq_lock() path.") d93300891f ("locking/osq_lock: Optimise the vcpu_is_preempted() check.") ff787c1fd0c13f9b d93300891f810c9498d09a6ecea ---------------- --------------------------- %stddev %change %stddev \ | \ 3880 ± 7% +26.4% 4903 ± 18% vmstat.system.cs 0.48 ±126% -99.8% 0.00 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom 0.16 ± 23% -38.9% 0.10 ± 32% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.genl_rcv_msg 1.50 ±118% -99.9% 0.00 ±142% perf-sched.sch_delay.max.ms.__cond_resched.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.__sys_recvfrom 2.55 ± 97% -83.7% 0.42 ±145% perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.genl_rcv_msg 32244865 +10.7% 35709040 stress-ng.netlink-task.ops 537413 +10.7% 595150 stress-ng.netlink-task.ops_per_sec 38094 ± 12% +42.2% 54160 ± 27% stress-ng.time.involuntary_context_switches 42290 ± 11% +39.8% 59117 ± 23% stress-ng.time.voluntary_context_switches 143.50 ± 7% -28.8% 102.17 ± 15% perf-c2c.DRAM.local 4955 ± 3% -26.4% 3647 ± 4% perf-c2c.DRAM.remote 4038 ± 2% -18.8% 3277 ± 4% perf-c2c.HITM.local 3483 ± 3% -21.1% 2747 ± 5% perf-c2c.HITM.remote 7521 ± 2% -19.9% 6024 ± 4% perf-c2c.HITM.total 0.42 ± 3% -16.2% 0.35 ± 5% perf-stat.i.MPKI 1.066e+10 +9.6% 1.169e+10 perf-stat.i.branch-instructions 51.90 -2.5 49.42 ± 2% perf-stat.i.cache-miss-rate% 22517746 ± 3% -13.4% 19503564 ± 5% perf-stat.i.cache-misses 3730 ± 7% +29.2% 4819 ± 19% perf-stat.i.context-switches 3.99 -3.1% 3.86 perf-stat.i.cpi 9535 ± 3% +15.4% 11003 ± 5% perf-stat.i.cycles-between-cache-misses 0.00 ± 3% +0.0 0.00 ± 3% perf-stat.i.dTLB-load-miss-rate% 1.419e+10 -14.9% 1.207e+10 perf-stat.i.dTLB-loads 8.411e+08 +8.4% 9.118e+08 perf-stat.i.dTLB-stores 5.36e+10 +3.1% 5.524e+10 perf-stat.i.instructions 0.26 +7.0% 0.28 ± 5% perf-stat.i.ipc 837.29 ± 3% -9.8% 755.30 ± 4% perf-stat.i.metric.K/sec 401.41 -4.1% 385.10 perf-stat.i.metric.M/sec 6404966 -23.2% 4920722 perf-stat.i.node-load-misses 141818 ± 4% -22.2% 110404 ± 4% perf-stat.i.node-loads 69.54 +13.8 83.36 perf-stat.i.node-store-miss-rate% 3935319 +10.4% 4345724 perf-stat.i.node-store-misses 1626033 -52.6% 771187 ± 5% perf-stat.i.node-stores 0.42 ± 3% -16.0% 0.35 ± 5% perf-stat.overall.MPKI 0.11 -0.0 0.10 ± 8% perf-stat.overall.branch-miss-rate% 51.32 -2.5 48.79 ± 2% perf-stat.overall.cache-miss-rate% 4.06 -3.0% 3.94 perf-stat.overall.cpi 9677 ± 3% +15.6% 11187 ± 5% perf-stat.overall.cycles-between-cache-misses 0.00 ± 3% +0.0 0.00 ± 4% perf-stat.overall.dTLB-load-miss-rate% 0.25 +3.1% 0.25 perf-stat.overall.ipc 70.78 +14.2 84.94 perf-stat.overall.node-store-miss-rate% 1.049e+10 +9.5% 1.149e+10 perf-stat.ps.branch-instructions 22167740 ± 3% -13.4% 19186498 ± 5% perf-stat.ps.cache-misses 3667 ± 7% +29.1% 4735 ± 19% perf-stat.ps.context-switches 1.396e+10 -15.0% 1.187e+10 perf-stat.ps.dTLB-loads 8.273e+08 +8.3% 8.963e+08 perf-stat.ps.dTLB-stores 5.274e+10 +3.0% 5.433e+10 perf-stat.ps.instructions 6303682 -23.2% 4839978 perf-stat.ps.node-load-misses 140690 ± 4% -22.5% 109023 ± 4% perf-stat.ps.node-loads 3875362 +10.3% 4276026 perf-stat.ps.node-store-misses 1599985 -52.6% 758184 ± 5% perf-stat.ps.node-stores 3.297e+12 +3.0% 3.396e+12 perf-stat.total.instructions 96.07 -0.2 95.87 perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.genl_rcv_msg.netlink_rcv_skb.genl_rcv 97.52 -0.1 97.37 perf-profile.calltrace.cycles-pp.__mutex_lock.genl_rcv_msg.netlink_rcv_skb.genl_rcv.netlink_unicast 98.98 -0.1 98.90 perf-profile.calltrace.cycles-pp.netlink_rcv_skb.genl_rcv.netlink_unicast.netlink_sendmsg.__sys_sendto 98.99 -0.1 98.92 perf-profile.calltrace.cycles-pp.genl_rcv.netlink_unicast.netlink_sendmsg.__sys_sendto.__x64_sys_sendto 98.97 -0.1 98.89 perf-profile.calltrace.cycles-pp.genl_rcv_msg.netlink_rcv_skb.genl_rcv.netlink_unicast.netlink_sendmsg 99.09 -0.1 99.04 perf-profile.calltrace.cycles-pp.netlink_unicast.netlink_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64 99.47 -0.0 99.43 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.stress_netlink_taskstats_monitor.stress_netlink_task 99.44 -0.0 99.40 perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.stress_netlink_taskstats_monitor 99.35 -0.0 99.32 perf-profile.calltrace.cycles-pp.netlink_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe 99.44 -0.0 99.40 perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto 96.08 -0.2 95.89 perf-profile.children.cycles-pp.osq_lock 97.52 -0.1 97.38 perf-profile.children.cycles-pp.__mutex_lock 98.98 -0.1 98.90 perf-profile.children.cycles-pp.netlink_rcv_skb 99.00 -0.1 98.92 perf-profile.children.cycles-pp.genl_rcv 98.97 -0.1 98.89 perf-profile.children.cycles-pp.genl_rcv_msg 99.20 -0.0 99.15 perf-profile.children.cycles-pp.netlink_unicast 0.13 ± 3% -0.0 0.08 ± 7% perf-profile.children.cycles-pp.genl_cmd_full_to_split 0.14 ± 4% -0.0 0.10 ± 5% perf-profile.children.cycles-pp.genl_get_cmd 99.36 -0.0 99.32 perf-profile.children.cycles-pp.netlink_sendmsg 99.44 -0.0 99.41 perf-profile.children.cycles-pp.__x64_sys_sendto 99.44 -0.0 99.41 perf-profile.children.cycles-pp.__sys_sendto 99.59 -0.0 99.56 perf-profile.children.cycles-pp.sendto 0.07 ± 5% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.genl_family_rcv_msg_attrs_parse 0.11 +0.0 0.12 ± 6% perf-profile.children.cycles-pp.apparmor_capable 0.18 ± 3% +0.0 0.20 ± 4% perf-profile.children.cycles-pp.netlink_recvmsg 0.36 +0.0 0.38 perf-profile.children.cycles-pp.fill_stats 0.13 ± 3% +0.0 0.15 ± 4% perf-profile.children.cycles-pp.ns_capable 0.20 ± 3% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.sock_recvmsg 0.24 ± 3% +0.0 0.27 ± 3% perf-profile.children.cycles-pp.__sys_recvfrom 0.24 ± 3% +0.0 0.27 ± 4% perf-profile.children.cycles-pp.__x64_sys_recvfrom 0.31 ± 3% +0.0 0.34 ± 3% perf-profile.children.cycles-pp.recv 1.22 +0.0 1.26 perf-profile.children.cycles-pp.genl_family_rcv_msg 0.85 +0.1 0.90 perf-profile.children.cycles-pp.cmd_attr_pid 0.94 +0.1 1.01 perf-profile.children.cycles-pp.genl_family_rcv_msg_doit 1.11 +0.1 1.23 perf-profile.children.cycles-pp.mutex_spin_on_owner 95.80 -0.2 95.62 perf-profile.self.cycles-pp.osq_lock 0.13 ± 3% -0.0 0.08 ± 7% perf-profile.self.cycles-pp.genl_cmd_full_to_split 0.11 ± 3% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.apparmor_capable 1.11 +0.1 1.23 perf-profile.self.cycles-pp.mutex_spin_on_owner Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki