Oliver, thank you for this report. All, with nr_task=30%, the benchmark hits the sweet spot on the contention curve amplifying the overhead of shuffling threads between waiting queues without reaping the locality overhead. I was able to reproduce the regression on our machine, though to a lesser extent of about 10% of the performance drop for the given test. Luckily, we have a solution for this exact scenario, which we call the shuffle reduction optimization, or SRO. It was a part of the series until v9, but since it did not provide much benefit in my benchmarks in v10, it was dropped. Now, with SRO, the regression on unixbench shrinks to about 1%, while other performance numbers do not change much. I attach the SRO patch here. IMHO, it is pretty straight-forward. It uses randomization, but only to throttle the creation of a secondary queue. In particular, it does not introduce any extra delays for threads waiting in that queue once it is created. Anyway, any feedback is welcome! Unless I hear any objections, I will plan to post another version of the series with SRO included. Thanks, -- Alex ----- Original Message ----- From: oliver.sang@xxxxxxxxx To: alex.kogan@xxxxxxxxxx Cc: linux@xxxxxxxxxxxxxxx, peterz@xxxxxxxxxxxxx, mingo@xxxxxxxxxx, will.deacon@xxxxxxx, arnd@xxxxxxxx, longman@xxxxxxxxxx, linux-arch@xxxxxxxxxxxxxxx, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, tglx@xxxxxxxxxxxxx, bp@xxxxxxxxx, hpa@xxxxxxxxx, x86@xxxxxxxxxx, guohanjun@xxxxxxxxxx, jglauber@xxxxxxxxxxx, steven.sistare@xxxxxxxxxx, daniel.m.jordan@xxxxxxxxxx, alex.kogan@xxxxxxxxxx, dave.dice@xxxxxxxxxx, lkp@xxxxxxxxx, lkp@xxxxxxxxxxxx, ying.huang@xxxxxxxxx, feng.tang@xxxxxxxxx, zhengjun.xing@xxxxxxxxx Sent: Sunday, November 22, 2020 4:33:52 AM GMT -05:00 US/Canada Eastern Subject: [locking/qspinlock] 6f9a39a437: unixbench.score -17.3% regression Greeting, FYI, we noticed a -17.3% regression of unixbench.score due to commit: commit: 6f9a39a4372e37907ac1fc7ede6c90932a88d174 ("[PATCH v12 5/5] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA") url: https://urldefense.com/v3/__https://github.com/0day-ci/linux/commits/Alex-Kogan/Add-NUMA-awareness-to-qspinlock/20201118-072506__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZiGJB2Kl$ base: https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZn0AlnmE$ 932f8c64d38bb08f69c8c26a2216ba0c36c6daa8 in testcase: unixbench on test machine: 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory with following parameters: runtime: 300s nr_task: 30% test: context1 cpufreq_governor: performance ucode: 0x4003003 test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system. test-url: https://urldefense.com/v3/__https://github.com/kdlucas/byte-unixbench__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZlLfqDIS$ If you fix the issue, kindly add following tag Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://urldefense.com/v3/__https://github.com/intel/lkp-tests.git__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZjvM7lRy$ cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp4/context1/unixbench/0x4003003 commit: eaf522d564 ("locking/qspinlock: Introduce starvation avoidance into CNA") 6f9a39a437 ("locking/qspinlock: Avoid moving certain threads between waiting queues in CNA") eaf522d56432e0e5 6f9a39a4372e37907ac1fc7ede6 ---------------- --------------------------- %stddev %change %stddev \ | \ 3715 -17.3% 3070 unixbench.score 11584 +13.2% 13118 unixbench.time.involuntary_context_switches 1830 +4.7% 1916 unixbench.time.percent_of_cpu_this_job_got 7012 +5.1% 7373 unixbench.time.system_time 141.44 -15.6% 119.37 unixbench.time.user_time 4.338e+08 -16.4% 3.627e+08 unixbench.time.voluntary_context_switches 5.807e+08 -17.5% 4.793e+08 unixbench.workload 139.00 ± 67% -71.0% 40.25 numa-vmstat.node1.nr_mlock 1.08 -0.1 0.94 mpstat.cpu.all.irq% 0.48 ± 2% -0.1 0.40 mpstat.cpu.all.usr% 956143 ± 7% +11.0% 1060959 ± 3% numa-meminfo.node0.MemUsed 1185909 ± 5% -8.8% 1081277 ± 3% numa-meminfo.node1.MemUsed 4402315 -16.3% 3682692 vmstat.system.cs 235535 -4.6% 224625 vmstat.system.in 6.42e+09 +16.4% 7.471e+09 cpuidle.C1.time 1.941e+10 ± 7% -20.0% 1.553e+10 ± 21% cpuidle.C1E.time 94497227 ± 5% -63.8% 34185071 ± 15% cpuidle.C1E.usage 2.62e+08 ± 8% -90.1% 26020649 cpuidle.POLL.time 81581001 ± 9% -96.1% 3221876 cpuidle.POLL.usage 84602 ± 3% +12.7% 95329 ± 5% softirqs.CPU65.SCHED 86631 ± 5% +10.9% 96057 ± 6% softirqs.CPU67.SCHED 81448 ± 3% +12.6% 91708 softirqs.CPU70.SCHED 99715 +8.1% 107808 ± 2% softirqs.CPU75.SCHED 91997 ± 4% +15.5% 106236 ± 2% softirqs.CPU81.SCHED 417904 ± 6% +43.6% 600289 ± 16% sched_debug.cfs_rq:/.MIN_vruntime.avg 3142033 +9.7% 3446986 ± 4% sched_debug.cfs_rq:/.MIN_vruntime.max 969106 +20.4% 1166681 ± 8% sched_debug.cfs_rq:/.MIN_vruntime.stddev 44659 ± 12% +21.1% 54091 ± 3% sched_debug.cfs_rq:/.exec_clock.min 12198 ± 12% +24.5% 15181 ± 9% sched_debug.cfs_rq:/.load.avg 417904 ± 6% +43.6% 600289 ± 16% sched_debug.cfs_rq:/.max_vruntime.avg 3142033 +9.7% 3446986 ± 4% sched_debug.cfs_rq:/.max_vruntime.max 969106 +20.4% 1166681 ± 8% sched_debug.cfs_rq:/.max_vruntime.stddev 1926443 ± 12% +25.6% 2419565 ± 3% sched_debug.cfs_rq:/.min_vruntime.min 0.41 ± 2% +16.3% 0.47 ± 3% sched_debug.cfs_rq:/.nr_running.avg 322.15 ± 2% +13.5% 365.49 ± 4% sched_debug.cfs_rq:/.util_est_enqueued.avg 58399 ± 49% -62.5% 21882 ± 74% sched_debug.cpu.avg_idle.min 3.74 ± 14% -20.1% 2.99 ± 3% sched_debug.cpu.clock.stddev 20770 ± 50% -65.0% 7271 ± 39% sched_debug.cpu.max_idle_balance_cost.stddev 8250432 -16.5% 6887763 sched_debug.cpu.nr_switches.avg 11243220 ± 4% -21.5% 8826971 sched_debug.cpu.nr_switches.max 1603956 ± 26% -52.5% 761566 ± 4% sched_debug.cpu.nr_switches.stddev 8248654 -16.5% 6885987 sched_debug.cpu.sched_count.avg 11240496 ± 4% -21.5% 8823964 sched_debug.cpu.sched_count.max 1603802 ± 26% -52.5% 761522 ± 4% sched_debug.cpu.sched_count.stddev 4123397 -16.5% 3441927 sched_debug.cpu.sched_goidle.avg 5619132 ± 4% -21.5% 4410755 sched_debug.cpu.sched_goidle.max 801761 ± 26% -52.5% 380727 ± 4% sched_debug.cpu.sched_goidle.stddev 4124921 -16.5% 3443709 sched_debug.cpu.ttwu_count.avg 5620396 ± 4% -21.5% 4412427 sched_debug.cpu.ttwu_count.max 801796 ± 26% -52.5% 380615 ± 4% sched_debug.cpu.ttwu_count.stddev 7.45e+09 -14.3% 6.382e+09 perf-stat.i.branch-instructions 1.33 -0.1 1.24 perf-stat.i.branch-miss-rate% 91615750 -22.0% 71469356 perf-stat.i.branch-misses 3.80 +2.5 6.31 ± 13% perf-stat.i.cache-miss-rate% 8753636 ± 4% +109.7% 18358392 perf-stat.i.cache-misses 7.691e+08 -14.2% 6.597e+08 perf-stat.i.cache-references 4428060 -16.4% 3704052 perf-stat.i.context-switches 2.87 +11.2% 3.20 perf-stat.i.cpi 8.789e+10 -5.6% 8.294e+10 perf-stat.i.cpu-cycles 16303 ± 7% -74.2% 4204 ± 2% perf-stat.i.cycles-between-cache-misses 8.94e+09 -14.0% 7.685e+09 perf-stat.i.dTLB-loads 4.951e+09 -16.2% 4.149e+09 perf-stat.i.dTLB-stores 57458394 -17.3% 47543962 perf-stat.i.iTLB-load-misses 30827890 -15.9% 25930501 perf-stat.i.iTLB-loads 3.327e+10 -14.6% 2.842e+10 perf-stat.i.instructions 581.15 +3.3% 600.28 perf-stat.i.instructions-per-iTLB-miss 0.36 -9.4% 0.33 perf-stat.i.ipc 0.92 -5.6% 0.86 perf-stat.i.metric.GHz 1.01 ± 4% +17.6% 1.18 ± 4% perf-stat.i.metric.K/sec 230.75 -14.6% 197.02 perf-stat.i.metric.M/sec 87.41 +8.0 95.42 perf-stat.i.node-load-miss-rate% 1718045 ± 3% +125.3% 3871440 perf-stat.i.node-load-misses 227252 ± 3% -71.5% 64814 ± 10% perf-stat.i.node-loads 1686277 ± 4% +120.6% 3720452 perf-stat.i.node-store-misses 1.23 -0.1 1.12 perf-stat.overall.branch-miss-rate% 1.14 ± 5% +1.6 2.78 perf-stat.overall.cache-miss-rate% 2.64 +10.5% 2.92 perf-stat.overall.cpi 10070 ± 4% -55.1% 4519 perf-stat.overall.cycles-between-cache-misses 579.14 +3.2% 597.84 perf-stat.overall.instructions-per-iTLB-miss 0.38 -9.5% 0.34 perf-stat.overall.ipc 88.31 +10.0 98.35 perf-stat.overall.node-load-miss-rate% 97.96 +1.3 99.24 perf-stat.overall.node-store-miss-rate% 22430 +3.3% 23175 perf-stat.overall.path-length 7.434e+09 -14.4% 6.365e+09 perf-stat.ps.branch-instructions 91428244 -22.0% 71275228 perf-stat.ps.branch-misses 8723893 ± 4% +109.8% 18304568 perf-stat.ps.cache-misses 7.674e+08 -14.3% 6.578e+08 perf-stat.ps.cache-references 4418679 -16.4% 3693530 perf-stat.ps.context-switches 8.77e+10 -5.7% 8.271e+10 perf-stat.ps.cpu-cycles 8.921e+09 -14.1% 7.664e+09 perf-stat.ps.dTLB-loads 4.94e+09 -16.3% 4.137e+09 perf-stat.ps.dTLB-stores 57330404 -17.3% 47408036 perf-stat.ps.iTLB-load-misses 30765981 -15.9% 25859786 perf-stat.ps.iTLB-loads 3.32e+10 -14.6% 2.834e+10 perf-stat.ps.instructions 1712299 ± 3% +125.4% 3860240 perf-stat.ps.node-load-misses 226568 ± 3% -71.4% 64722 ± 10% perf-stat.ps.node-loads 1680387 ± 4% +120.8% 3709583 perf-stat.ps.node-store-misses 1.302e+13 -14.7% 1.111e+13 perf-stat.total.instructions 3591158 ± 5% -25.1% 2688593 interrupts.CAL:Function_call_interrupts 2328 ± 19% +42.8% 3323 ± 3% interrupts.CPU0.NMI:Non-maskable_interrupts 2328 ± 19% +42.8% 3323 ± 3% interrupts.CPU0.PMI:Performance_monitoring_interrupts 110354 ± 9% -20.0% 88244 ± 4% interrupts.CPU0.RES:Rescheduling_interrupts 128508 ± 14% -27.1% 93721 ± 3% interrupts.CPU1.RES:Rescheduling_interrupts 2180 ± 30% +47.0% 3205 ± 15% interrupts.CPU10.NMI:Non-maskable_interrupts 2180 ± 30% +47.0% 3205 ± 15% interrupts.CPU10.PMI:Performance_monitoring_interrupts 133107 ± 8% -25.7% 98924 ± 2% interrupts.CPU10.RES:Rescheduling_interrupts 133955 ± 13% -28.9% 95305 ± 6% interrupts.CPU11.RES:Rescheduling_interrupts 129709 ± 10% -24.9% 97452 ± 8% interrupts.CPU12.RES:Rescheduling_interrupts 130073 ± 10% -21.2% 102507 ± 2% interrupts.CPU13.RES:Rescheduling_interrupts 136313 ± 10% -27.4% 99010 ± 3% interrupts.CPU14.RES:Rescheduling_interrupts 139937 ± 7% -29.9% 98077 ± 7% interrupts.CPU15.RES:Rescheduling_interrupts 143424 ± 11% -28.4% 102678 ± 7% interrupts.CPU16.RES:Rescheduling_interrupts 138084 ± 10% -25.7% 102625 ± 5% interrupts.CPU17.RES:Rescheduling_interrupts 136238 ± 6% -26.3% 100366 ± 7% interrupts.CPU18.RES:Rescheduling_interrupts 140011 ± 10% -28.4% 100232 ± 4% interrupts.CPU19.RES:Rescheduling_interrupts 129720 ± 7% -28.8% 92405 ± 7% interrupts.CPU2.RES:Rescheduling_interrupts 43177 ± 33% -34.6% 28234 ± 5% interrupts.CPU20.CAL:Function_call_interrupts 143060 ± 6% -28.5% 102289 ± 7% interrupts.CPU20.RES:Rescheduling_interrupts 39911 ± 20% -30.4% 27788 ± 4% interrupts.CPU21.CAL:Function_call_interrupts 144644 ± 9% -27.6% 104676 ± 6% interrupts.CPU21.RES:Rescheduling_interrupts 38543 ± 21% -35.1% 25019 ± 14% interrupts.CPU22.CAL:Function_call_interrupts 144984 ± 7% -29.9% 101700 ± 2% interrupts.CPU22.RES:Rescheduling_interrupts 37835 ± 15% -22.9% 29155 ± 5% interrupts.CPU23.CAL:Function_call_interrupts 2089 ± 19% +70.6% 3565 ± 20% interrupts.CPU23.NMI:Non-maskable_interrupts 2089 ± 19% +70.6% 3565 ± 20% interrupts.CPU23.PMI:Performance_monitoring_interrupts 130192 ± 7% -22.1% 101416 ± 5% interrupts.CPU23.RES:Rescheduling_interrupts 37142 ± 6% -32.8% 24974 ± 6% interrupts.CPU24.CAL:Function_call_interrupts 142384 ± 5% -31.7% 97277 ± 6% interrupts.CPU24.RES:Rescheduling_interrupts 32664 ± 9% -22.2% 25422 ± 6% interrupts.CPU25.CAL:Function_call_interrupts 141175 ± 5% -31.2% 97084 ± 2% interrupts.CPU25.RES:Rescheduling_interrupts 31023 ± 21% -24.8% 23330 ± 7% interrupts.CPU26.CAL:Function_call_interrupts 131921 ± 4% -28.9% 93831 ± 3% interrupts.CPU26.RES:Rescheduling_interrupts 32946 ± 19% -26.2% 24303 ± 5% interrupts.CPU27.CAL:Function_call_interrupts 144853 ± 4% -35.7% 93190 ± 2% interrupts.CPU27.RES:Rescheduling_interrupts 136419 ± 4% -31.3% 93690 interrupts.CPU28.RES:Rescheduling_interrupts 36609 ± 20% -35.3% 23696 ± 5% interrupts.CPU29.CAL:Function_call_interrupts 145284 ± 10% -36.1% 92871 interrupts.CPU29.RES:Rescheduling_interrupts 122699 ± 7% -23.8% 93459 ± 10% interrupts.CPU3.RES:Rescheduling_interrupts 250.50 ± 40% -79.9% 50.25 ± 99% interrupts.CPU3.TLB:TLB_shootdowns 35689 ± 19% -36.1% 22793 ± 11% interrupts.CPU30.CAL:Function_call_interrupts 152345 ± 4% -40.3% 90991 ± 3% interrupts.CPU30.RES:Rescheduling_interrupts 33895 ± 10% -15.1% 28774 ± 8% interrupts.CPU31.CAL:Function_call_interrupts 150590 ± 5% -35.5% 97092 ± 7% interrupts.CPU31.RES:Rescheduling_interrupts 50156 ± 28% -45.8% 27170 ± 7% interrupts.CPU32.CAL:Function_call_interrupts 3757 ± 7% -43.6% 2120 ± 32% interrupts.CPU32.NMI:Non-maskable_interrupts 3757 ± 7% -43.6% 2120 ± 32% interrupts.CPU32.PMI:Performance_monitoring_interrupts 150142 ± 3% -36.3% 95673 interrupts.CPU32.RES:Rescheduling_interrupts 39957 ± 25% -34.5% 26158 ± 4% interrupts.CPU33.CAL:Function_call_interrupts 147066 ± 8% -34.4% 96521 ± 2% interrupts.CPU33.RES:Rescheduling_interrupts 168.25 ±137% -86.9% 22.00 ± 59% interrupts.CPU33.TLB:TLB_shootdowns 38357 ± 13% -29.9% 26881 ± 5% interrupts.CPU34.CAL:Function_call_interrupts 3757 ± 5% -28.5% 2686 ± 19% interrupts.CPU34.NMI:Non-maskable_interrupts 3757 ± 5% -28.5% 2686 ± 19% interrupts.CPU34.PMI:Performance_monitoring_interrupts 140734 ± 2% -33.3% 93841 ± 3% interrupts.CPU34.RES:Rescheduling_interrupts 37965 ± 17% -25.8% 28175 ± 4% interrupts.CPU35.CAL:Function_call_interrupts 3934 ± 8% -39.3% 2389 ± 13% interrupts.CPU35.NMI:Non-maskable_interrupts 3934 ± 8% -39.3% 2389 ± 13% interrupts.CPU35.PMI:Performance_monitoring_interrupts 146074 ± 10% -33.2% 97630 ± 2% interrupts.CPU35.RES:Rescheduling_interrupts 34131 ± 8% -18.8% 27704 ± 9% interrupts.CPU36.CAL:Function_call_interrupts 149093 ± 3% -35.0% 96945 ± 4% interrupts.CPU36.RES:Rescheduling_interrupts 44333 ± 47% -39.7% 26745 ± 7% interrupts.CPU37.CAL:Function_call_interrupts 149936 ± 4% -34.3% 98542 ± 3% interrupts.CPU37.RES:Rescheduling_interrupts 41199 ± 28% -30.2% 28741 ± 6% interrupts.CPU38.CAL:Function_call_interrupts 154224 ± 3% -31.6% 105443 ± 7% interrupts.CPU38.RES:Rescheduling_interrupts 36925 ± 8% -24.3% 27942 ± 5% interrupts.CPU39.CAL:Function_call_interrupts 150490 ± 2% -32.5% 101625 ± 4% interrupts.CPU39.RES:Rescheduling_interrupts 122742 ± 15% -25.4% 91596 ± 5% interrupts.CPU4.RES:Rescheduling_interrupts 143639 ± 9% -29.4% 101407 ± 2% interrupts.CPU40.RES:Rescheduling_interrupts 43235 ± 10% -30.9% 29877 ± 4% interrupts.CPU41.CAL:Function_call_interrupts 158981 ± 5% -32.8% 106760 ± 4% interrupts.CPU41.RES:Rescheduling_interrupts 47792 ± 33% -37.7% 29769 ± 5% interrupts.CPU42.CAL:Function_call_interrupts 3455 ± 11% -32.2% 2343 ± 36% interrupts.CPU42.NMI:Non-maskable_interrupts 3455 ± 11% -32.2% 2343 ± 36% interrupts.CPU42.PMI:Performance_monitoring_interrupts 160241 ± 5% -34.0% 105793 ± 4% interrupts.CPU42.RES:Rescheduling_interrupts 54419 ± 52% -44.1% 30408 ± 2% interrupts.CPU43.CAL:Function_call_interrupts 3726 ± 11% -38.7% 2285 ± 39% interrupts.CPU43.NMI:Non-maskable_interrupts 3726 ± 11% -38.7% 2285 ± 39% interrupts.CPU43.PMI:Performance_monitoring_interrupts 156010 -32.4% 105516 ± 2% interrupts.CPU43.RES:Rescheduling_interrupts 69033 ± 79% -56.0% 30393 ± 7% interrupts.CPU44.CAL:Function_call_interrupts 152478 ± 6% -30.4% 106187 ± 4% interrupts.CPU44.RES:Rescheduling_interrupts 49434 ± 49% -38.5% 30404 ± 9% interrupts.CPU45.CAL:Function_call_interrupts 153770 ± 7% -32.2% 104200 ± 3% interrupts.CPU45.RES:Rescheduling_interrupts 56303 ± 52% -50.4% 27914 ± 4% interrupts.CPU46.CAL:Function_call_interrupts 3924 ± 20% -48.7% 2012 ± 50% interrupts.CPU46.NMI:Non-maskable_interrupts 3924 ± 20% -48.7% 2012 ± 50% interrupts.CPU46.PMI:Performance_monitoring_interrupts 152891 ± 11% -31.7% 104494 ± 5% interrupts.CPU46.RES:Rescheduling_interrupts 42970 ± 30% -29.9% 30107 ± 9% interrupts.CPU47.CAL:Function_call_interrupts 3940 ± 8% -40.8% 2332 ± 38% interrupts.CPU47.NMI:Non-maskable_interrupts 3940 ± 8% -40.8% 2332 ± 38% interrupts.CPU47.PMI:Performance_monitoring_interrupts 146615 ± 5% -27.7% 106013 ± 4% interrupts.CPU47.RES:Rescheduling_interrupts 146863 ± 5% -18.4% 119774 ± 3% interrupts.CPU48.RES:Rescheduling_interrupts 136692 ± 8% -16.3% 114405 ± 7% interrupts.CPU49.RES:Rescheduling_interrupts 29311 ± 6% -12.4% 25673 ± 4% interrupts.CPU5.CAL:Function_call_interrupts 129497 ± 7% -27.1% 94375 ± 6% interrupts.CPU5.RES:Rescheduling_interrupts 143797 ± 11% -21.0% 113564 ± 4% interrupts.CPU50.RES:Rescheduling_interrupts 2891 ± 16% +31.3% 3797 ± 12% interrupts.CPU51.NMI:Non-maskable_interrupts 2891 ± 16% +31.3% 3797 ± 12% interrupts.CPU51.PMI:Performance_monitoring_interrupts 139766 ± 2% -19.6% 112352 ± 8% interrupts.CPU51.RES:Rescheduling_interrupts 137319 ± 4% -20.3% 109422 ± 5% interrupts.CPU52.RES:Rescheduling_interrupts 138705 ± 5% -21.3% 109158 ± 8% interrupts.CPU53.RES:Rescheduling_interrupts 2426 ± 28% +42.8% 3464 ± 19% interrupts.CPU54.NMI:Non-maskable_interrupts 2426 ± 28% +42.8% 3464 ± 19% interrupts.CPU54.PMI:Performance_monitoring_interrupts 140683 ± 11% -24.0% 106919 ± 4% interrupts.CPU54.RES:Rescheduling_interrupts 38238 ± 13% -22.9% 29493 ± 6% interrupts.CPU55.CAL:Function_call_interrupts 3043 ± 8% +18.7% 3612 ± 7% interrupts.CPU55.NMI:Non-maskable_interrupts 3043 ± 8% +18.7% 3612 ± 7% interrupts.CPU55.PMI:Performance_monitoring_interrupts 143657 ± 10% -25.0% 107806 ± 6% interrupts.CPU55.RES:Rescheduling_interrupts 131036 ± 8% -21.3% 103177 ± 4% interrupts.CPU56.RES:Rescheduling_interrupts 131204 ± 12% -21.2% 103444 ± 10% interrupts.CPU57.RES:Rescheduling_interrupts 122041 ± 12% -15.9% 102674 ± 7% interrupts.CPU58.RES:Rescheduling_interrupts 167.25 ± 65% -64.7% 59.00 ±157% interrupts.CPU58.TLB:TLB_shootdowns 1883 ± 33% +61.6% 3042 ± 3% interrupts.CPU6.NMI:Non-maskable_interrupts 1883 ± 33% +61.6% 3042 ± 3% interrupts.CPU6.PMI:Performance_monitoring_interrupts 132101 ± 12% -27.0% 96457 ± 8% interrupts.CPU6.RES:Rescheduling_interrupts 1832 ± 24% +69.3% 3102 ± 32% interrupts.CPU64.NMI:Non-maskable_interrupts 1832 ± 24% +69.3% 3102 ± 32% interrupts.CPU64.PMI:Performance_monitoring_interrupts 107979 ± 8% -11.6% 95452 interrupts.CPU66.RES:Rescheduling_interrupts 97965 ± 3% -15.1% 83199 ± 2% interrupts.CPU69.RES:Rescheduling_interrupts 126380 ± 11% -24.6% 95257 ± 5% interrupts.CPU7.RES:Rescheduling_interrupts 1820 ± 40% +60.9% 2929 ± 35% interrupts.CPU70.NMI:Non-maskable_interrupts 1820 ± 40% +60.9% 2929 ± 35% interrupts.CPU70.PMI:Performance_monitoring_interrupts 171279 ± 5% -29.4% 120994 ± 5% interrupts.CPU72.RES:Rescheduling_interrupts 50761 ± 40% -35.0% 32979 ± 7% interrupts.CPU73.CAL:Function_call_interrupts 173132 ± 7% -31.5% 118555 ± 5% interrupts.CPU73.RES:Rescheduling_interrupts 43479 ± 17% -25.8% 32276 ± 3% interrupts.CPU74.CAL:Function_call_interrupts 3755 ± 9% -31.7% 2564 ± 31% interrupts.CPU74.NMI:Non-maskable_interrupts 3755 ± 9% -31.7% 2564 ± 31% interrupts.CPU74.PMI:Performance_monitoring_interrupts 167124 ± 7% -28.8% 119063 ± 4% interrupts.CPU74.RES:Rescheduling_interrupts 164069 ± 7% -26.6% 120499 ± 4% interrupts.CPU75.RES:Rescheduling_interrupts 166858 ± 6% -28.4% 119453 ± 4% interrupts.CPU76.RES:Rescheduling_interrupts 157535 ± 6% -25.5% 117419 ± 4% interrupts.CPU77.RES:Rescheduling_interrupts 165642 ± 8% -25.9% 122719 ± 8% interrupts.CPU78.RES:Rescheduling_interrupts 162781 ± 5% -29.0% 115600 ± 3% interrupts.CPU79.RES:Rescheduling_interrupts 132224 ± 11% -26.6% 97010 interrupts.CPU8.RES:Rescheduling_interrupts 167082 ± 9% -30.7% 115794 ± 4% interrupts.CPU80.RES:Rescheduling_interrupts 49639 ± 37% -35.1% 32228 ± 2% interrupts.CPU81.CAL:Function_call_interrupts 144305 ± 5% -18.3% 117926 ± 4% interrupts.CPU81.RES:Rescheduling_interrupts 151333 ± 7% -23.2% 116159 ± 3% interrupts.CPU82.RES:Rescheduling_interrupts 142398 ± 8% -21.1% 112399 ± 7% interrupts.CPU83.RES:Rescheduling_interrupts 144455 ± 2% -20.5% 114911 interrupts.CPU84.RES:Rescheduling_interrupts 149850 ± 9% -24.3% 113396 ± 5% interrupts.CPU85.RES:Rescheduling_interrupts 34458 ± 4% -14.4% 29487 ± 8% interrupts.CPU86.CAL:Function_call_interrupts 138603 ± 6% -22.7% 107133 ± 2% interrupts.CPU86.RES:Rescheduling_interrupts 39228 ± 7% -25.5% 29231 ± 4% interrupts.CPU87.CAL:Function_call_interrupts 151814 ± 8% -31.1% 104629 ± 5% interrupts.CPU87.RES:Rescheduling_interrupts 137356 ± 8% -20.2% 109634 ± 3% interrupts.CPU88.RES:Rescheduling_interrupts 143613 ± 10% -28.9% 102166 ± 10% interrupts.CPU89.RES:Rescheduling_interrupts 122375 ± 8% -19.2% 98901 ± 3% interrupts.CPU9.RES:Rescheduling_interrupts 140781 ± 6% -25.0% 105531 ± 3% interrupts.CPU90.RES:Rescheduling_interrupts 138917 ± 12% -24.9% 104264 ± 5% interrupts.CPU91.RES:Rescheduling_interrupts 146814 ± 14% -29.2% 103902 ± 4% interrupts.CPU92.RES:Rescheduling_interrupts 132220 ± 15% -21.3% 104095 ± 2% interrupts.CPU93.RES:Rescheduling_interrupts 133.00 ± 88% -87.6% 16.50 ± 72% interrupts.CPU93.TLB:TLB_shootdowns 125991 ± 5% -19.0% 101995 ± 2% interrupts.CPU94.RES:Rescheduling_interrupts 115838 ± 9% -17.2% 95959 ± 3% interrupts.CPU95.RES:Rescheduling_interrupts 13255498 ± 2% -25.6% 9859155 interrupts.RES:Rescheduling_interrupts 7.59 ± 2% -1.5 6.04 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.43 ± 2% -1.5 5.91 perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64 6.03 ± 4% -1.0 5.06 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.90 ± 4% -1.0 4.95 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 4.44 ± 3% -0.9 3.51 perf-profile.calltrace.cycles-pp.schedule.pipe_read.new_sync_read.vfs_read.ksys_read 2.29 ± 4% -0.9 1.38 ± 2% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.new_sync_read 4.07 ± 3% -0.9 3.21 perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.new_sync_read.vfs_read 2.62 ± 3% -0.9 1.76 ± 4% perf-profile.calltrace.cycles-pp.read 3.68 ± 2% -0.8 2.83 perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 2.06 ± 4% -0.8 1.22 perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read 3.58 ± 2% -0.8 2.76 perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary 2.37 ± 3% -0.8 1.58 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read 2.29 ± 3% -0.8 1.53 ± 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 2.26 ± 3% -0.8 1.50 ± 4% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 2.21 ± 3% -0.7 1.47 ± 4% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 4.25 ± 3% -0.7 3.51 perf-profile.calltrace.cycles-pp.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate 2.14 ± 4% -0.6 1.52 perf-profile.calltrace.cycles-pp.unwind_next_frame.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity 3.48 ± 4% -0.6 2.90 ± 2% perf-profile.calltrace.cycles-pp.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair 1.93 ± 3% -0.5 1.48 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule_idle.do_idle.cpu_startup_entry 1.54 ± 4% -0.4 1.18 perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.38 ± 3% -0.3 1.04 ± 2% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule_idle.do_idle 0.72 ± 4% -0.1 0.58 ± 3% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary 0.66 ± 4% -0.1 0.54 ± 2% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read.ksys_read 46.28 +0.5 46.74 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 0.14 ±173% +0.5 0.66 ± 9% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify 0.14 ±173% +0.5 0.66 ± 9% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel 0.15 ±173% +0.6 0.71 ± 8% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify 0.15 ±173% +0.6 0.71 ± 8% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify 0.15 ±173% +0.6 0.71 ± 8% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify 7.85 ± 2% +0.8 8.64 ± 3% perf-profile.calltrace.cycles-pp.write 7.77 ± 2% +0.8 8.58 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 7.73 ± 2% +0.8 8.55 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 7.69 ± 3% +0.8 8.53 ± 3% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 7.64 ± 3% +0.9 8.49 ± 3% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 35.29 +0.9 36.15 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 35.15 +0.9 36.02 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 42.35 +1.8 44.15 perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 42.22 +1.8 44.06 perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64 38.77 +1.9 40.67 perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 38.65 +1.9 40.56 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary 40.84 +2.1 42.96 perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write 40.50 +2.1 42.65 perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write 40.15 +2.2 42.36 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write 40.07 +2.2 42.29 perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write 37.50 +2.7 40.20 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock 37.47 +2.7 40.18 perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common 36.96 +2.9 39.84 perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function 36.62 ± 2% +3.2 39.86 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 34.50 +3.3 37.80 perf-profile.calltrace.cycles-pp.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up 29.96 ± 2% +4.1 34.04 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate 29.13 ± 2% +4.1 33.22 perf-profile.calltrace.cycles-pp.__cna_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair 8.30 ± 2% -1.7 6.58 perf-profile.children.cycles-pp.ksys_read 8.12 ± 2% -1.7 6.42 perf-profile.children.cycles-pp.vfs_read 7.75 ± 2% -1.7 6.06 perf-profile.children.cycles-pp.__schedule 7.59 ± 2% -1.5 6.05 perf-profile.children.cycles-pp.new_sync_read 7.45 ± 2% -1.5 5.94 perf-profile.children.cycles-pp.pipe_read 4.44 ± 3% -0.9 3.52 perf-profile.children.cycles-pp.schedule 2.65 ± 3% -0.9 1.78 ± 4% perf-profile.children.cycles-pp.read 3.70 ± 2% -0.8 2.87 perf-profile.children.cycles-pp.schedule_idle 4.28 ± 3% -0.7 3.54 perf-profile.children.cycles-pp.stack_trace_save_tsk 0.80 ± 35% -0.7 0.13 ± 5% perf-profile.children.cycles-pp.poll_idle 3.54 ± 3% -0.6 2.94 ± 2% perf-profile.children.cycles-pp.arch_stack_walk 2.02 ± 3% -0.6 1.43 ± 2% perf-profile.children.cycles-pp.update_load_avg 2.15 ± 3% -0.5 1.67 perf-profile.children.cycles-pp.pick_next_task_fair 2.30 ± 4% -0.5 1.82 perf-profile.children.cycles-pp.dequeue_task_fair 2.10 ± 4% -0.5 1.63 ± 2% perf-profile.children.cycles-pp.dequeue_entity 1.56 ± 4% -0.4 1.20 perf-profile.children.cycles-pp.menu_select 1.39 ± 3% -0.3 1.06 ± 2% perf-profile.children.cycles-pp.set_next_entity 0.46 ± 13% -0.3 0.15 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending 0.92 ± 3% -0.2 0.70 ± 2% perf-profile.children.cycles-pp.prepare_to_wait_event 1.13 -0.2 0.92 ± 3% perf-profile.children.cycles-pp.asm_call_sysvec_on_stack 0.33 ± 9% -0.2 0.12 ± 3% perf-profile.children.cycles-pp.asm_sysvec_call_function_single 0.32 ± 10% -0.2 0.11 ± 3% perf-profile.children.cycles-pp.__sysvec_call_function_single 0.61 ± 3% -0.2 0.41 ± 4% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq 0.32 ± 10% -0.2 0.11 ± 4% perf-profile.children.cycles-pp.sysvec_call_function_single 0.47 ± 6% -0.2 0.28 perf-profile.children.cycles-pp.finish_task_switch 0.56 ± 5% -0.2 0.36 ± 3% perf-profile.children.cycles-pp.unwind_get_return_address 0.50 ± 6% -0.2 0.32 ± 4% perf-profile.children.cycles-pp.__kernel_text_address 0.96 ± 5% -0.2 0.78 perf-profile.children.cycles-pp.update_curr 0.44 ± 6% -0.2 0.27 ± 4% perf-profile.children.cycles-pp.kernel_text_address 2.17 ± 4% -0.2 2.00 perf-profile.children.cycles-pp.unwind_next_frame 0.73 ± 3% -0.2 0.56 ± 4% perf-profile.children.cycles-pp.select_task_rq_fair 0.95 -0.2 0.79 ± 2% perf-profile.children.cycles-pp.update_rq_clock 0.74 ± 4% -0.1 0.59 ± 4% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length 0.53 ± 3% -0.1 0.40 ± 5% perf-profile.children.cycles-pp.ktime_get 0.41 ± 4% -0.1 0.28 ± 3% perf-profile.children.cycles-pp.stack_trace_consume_entry_nosched 0.71 -0.1 0.59 ± 3% perf-profile.children.cycles-pp.mutex_lock 0.50 ± 2% -0.1 0.38 ± 3% perf-profile.children.cycles-pp.tick_nohz_idle_exit 0.44 -0.1 0.33 perf-profile.children.cycles-pp.__orc_find 0.52 ± 2% -0.1 0.41 ± 3% perf-profile.children.cycles-pp.copy_page_to_iter 0.15 ± 19% -0.1 0.05 ± 8% perf-profile.children.cycles-pp.flush_smp_call_function_from_idle 0.44 ± 4% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.security_file_permission 0.53 ± 2% -0.1 0.43 perf-profile.children.cycles-pp.__switch_to 0.48 ± 3% -0.1 0.38 ± 3% perf-profile.children.cycles-pp.__switch_to_asm 0.37 ± 3% -0.1 0.27 ± 4% perf-profile.children.cycles-pp.__update_load_avg_se 0.67 ± 2% -0.1 0.57 ± 2% perf-profile.children.cycles-pp._raw_spin_lock 0.32 ± 4% -0.1 0.22 ± 4% perf-profile.children.cycles-pp.copy_page_from_iter 0.38 ± 4% -0.1 0.29 ± 5% perf-profile.children.cycles-pp.select_idle_sibling 0.45 ± 5% -0.1 0.37 ± 4% perf-profile.children.cycles-pp.tick_nohz_next_event 0.29 ± 4% -0.1 0.21 ± 3% perf-profile.children.cycles-pp.tick_nohz_idle_enter 0.64 ± 2% -0.1 0.57 ± 3% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore 0.38 ± 3% -0.1 0.31 ± 4% perf-profile.children.cycles-pp.copyout 0.27 ± 6% -0.1 0.19 ± 6% perf-profile.children.cycles-pp.orc_find 0.40 ± 2% -0.1 0.33 ± 5% perf-profile.children.cycles-pp.copy_user_generic_unrolled 0.35 ± 4% -0.1 0.28 perf-profile.children.cycles-pp.pick_next_entity 0.38 ± 4% -0.1 0.31 perf-profile.children.cycles-pp.update_cfs_group 0.22 ± 4% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.___perf_sw_event 0.30 ± 5% -0.1 0.23 ± 3% perf-profile.children.cycles-pp.__unwind_start 0.32 ± 4% -0.1 0.26 perf-profile.children.cycles-pp.ttwu_do_wakeup 0.20 ± 4% -0.1 0.14 ± 9% perf-profile.children.cycles-pp.__might_sleep 0.28 ± 6% -0.1 0.22 ± 5% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.27 ± 4% -0.1 0.21 ± 3% perf-profile.children.cycles-pp.common_file_perm 0.18 ± 3% -0.1 0.12 ± 3% perf-profile.children.cycles-pp.in_sched_functions 0.30 ± 4% -0.1 0.24 perf-profile.children.cycles-pp.check_preempt_curr 0.22 ± 4% -0.1 0.17 ± 4% perf-profile.children.cycles-pp.rcu_idle_exit 0.34 ± 3% -0.1 0.28 ± 2% perf-profile.children.cycles-pp.sched_clock_cpu 0.30 ± 4% -0.1 0.24 ± 4% perf-profile.children.cycles-pp.update_ts_time_stats 0.31 ± 5% -0.1 0.25 ± 4% perf-profile.children.cycles-pp.nr_iowait_cpu 0.31 ± 3% -0.1 0.26 ± 3% perf-profile.children.cycles-pp.sched_clock 0.21 ± 5% -0.1 0.16 ± 7% perf-profile.children.cycles-pp.cpus_share_cache 0.17 ± 10% -0.1 0.11 ± 7% perf-profile.children.cycles-pp.place_entity 0.28 ± 3% -0.1 0.23 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.18 ± 4% -0.1 0.13 ± 3% perf-profile.children.cycles-pp.resched_curr 0.33 ± 2% -0.0 0.28 ± 2% perf-profile.children.cycles-pp.switch_mm_irqs_off 0.29 ± 3% -0.0 0.24 ± 2% perf-profile.children.cycles-pp.mutex_unlock 0.23 ± 3% -0.0 0.18 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.26 ± 3% -0.0 0.21 perf-profile.children.cycles-pp.___might_sleep 0.20 ± 6% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.__list_del_entry_valid 0.29 ± 5% -0.0 0.25 ± 3% perf-profile.children.cycles-pp.native_sched_clock 0.24 ± 5% -0.0 0.19 ± 5% perf-profile.children.cycles-pp.get_next_timer_interrupt 0.12 ± 5% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.cpuidle_governor_latency_req 0.23 ± 8% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.hrtimer_next_event_without 0.21 ± 3% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.read_tsc 0.14 ± 3% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.rcu_eqs_exit 0.12 ± 4% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.__entry_text_start 0.19 ± 2% -0.0 0.15 ± 5% perf-profile.children.cycles-pp.__fdget_pos 0.08 ± 6% -0.0 0.04 ± 58% perf-profile.children.cycles-pp.rcu_dynticks_eqs_exit 0.07 ± 10% -0.0 0.04 ± 57% perf-profile.children.cycles-pp.put_prev_entity 0.11 ± 13% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.put_prev_task_fair 0.17 ± 4% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.15 ± 7% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.hrtimer_get_next_event 0.16 ± 2% -0.0 0.14 ± 6% perf-profile.children.cycles-pp.__fget_light 0.13 ± 10% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.is_bpf_text_address 0.11 ± 6% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.file_update_time 0.14 ± 6% -0.0 0.11 ± 11% perf-profile.children.cycles-pp.__wrgsbase_inactive 0.14 ± 8% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.available_idle_cpu 0.09 ± 4% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.menu_reflect 0.13 ± 9% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.stack_access_ok 0.14 ± 5% -0.0 0.12 ± 7% perf-profile.children.cycles-pp.switch_fpu_return 0.10 ± 8% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.current_time 0.09 ± 9% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.__rdgsbase_inactive 0.10 -0.0 0.08 perf-profile.children.cycles-pp.__calc_delta 0.09 ± 10% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.bpf_ksym_find 0.07 ± 10% -0.0 0.05 perf-profile.children.cycles-pp.pick_next_task_idle 0.18 ± 3% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.fsnotify 0.17 ± 5% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.copy_fpregs_to_fpstate 0.07 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.put_task_stack 0.07 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.apparmor_file_permission 0.07 ± 12% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 0.07 ± 6% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.update_min_vruntime 0.17 ± 2% -0.0 0.16 perf-profile.children.cycles-pp.anon_pipe_buf_release 0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.atime_needs_update 0.08 ± 5% -0.0 0.07 perf-profile.children.cycles-pp.finish_wait 0.48 ± 14% +0.2 0.71 ± 8% perf-profile.children.cycles-pp.start_kernel 46.28 +0.5 46.74 perf-profile.children.cycles-pp.secondary_startup_64_no_verify 46.28 +0.5 46.74 perf-profile.children.cycles-pp.cpu_startup_entry 46.25 +0.5 46.71 perf-profile.children.cycles-pp.do_idle 7.88 ± 2% +0.8 8.65 ± 3% perf-profile.children.cycles-pp.write 42.99 +1.7 44.69 perf-profile.children.cycles-pp.ksys_write 42.80 +1.7 44.53 perf-profile.children.cycles-pp.vfs_write 42.37 +1.8 44.16 perf-profile.children.cycles-pp.new_sync_write 42.23 +1.8 44.06 perf-profile.children.cycles-pp.pipe_write 39.21 +2.1 41.33 perf-profile.children.cycles-pp.cpuidle_enter 40.84 +2.1 42.96 perf-profile.children.cycles-pp.__wake_up_common_lock 39.20 +2.1 41.32 perf-profile.children.cycles-pp.cpuidle_enter_state 40.50 +2.2 42.65 perf-profile.children.cycles-pp.__wake_up_common 40.15 +2.2 42.36 perf-profile.children.cycles-pp.autoremove_wake_function 40.09 +2.2 42.30 perf-profile.children.cycles-pp.try_to_wake_up 37.97 +2.4 40.36 perf-profile.children.cycles-pp.ttwu_do_activate 37.94 +2.4 40.33 perf-profile.children.cycles-pp.enqueue_task_fair 37.50 +2.5 40.05 perf-profile.children.cycles-pp.enqueue_entity 36.91 +2.9 39.86 perf-profile.children.cycles-pp.intel_idle 34.95 +3.0 37.95 perf-profile.children.cycles-pp.__account_scheduler_latency 31.46 +3.5 35.00 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 29.71 ± 2% +3.8 33.52 perf-profile.children.cycles-pp.__cna_queued_spin_lock_slowpath 0.71 ± 39% -0.7 0.05 ± 8% perf-profile.self.cycles-pp.poll_idle 1.08 ± 3% -0.3 0.78 perf-profile.self.cycles-pp.update_load_avg 1.24 ± 2% -0.2 1.02 ± 2% perf-profile.self.cycles-pp.__schedule 1.86 -0.2 1.65 perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.59 ± 3% -0.2 0.40 ± 4% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq 0.95 ± 3% -0.2 0.75 ± 2% perf-profile.self.cycles-pp.set_next_entity 0.66 ± 4% -0.2 0.51 ± 6% perf-profile.self.cycles-pp.menu_select 0.43 ± 5% -0.1 0.28 ± 3% perf-profile.self.cycles-pp.enqueue_task_fair 0.53 ± 3% -0.1 0.40 ± 2% perf-profile.self.cycles-pp._raw_spin_lock 0.67 ± 2% -0.1 0.54 ± 2% perf-profile.self.cycles-pp.stack_trace_save_tsk 0.77 ± 2% -0.1 0.64 ± 2% perf-profile.self.cycles-pp.update_rq_clock 0.72 ± 8% -0.1 0.60 perf-profile.self.cycles-pp.update_curr 0.44 -0.1 0.33 perf-profile.self.cycles-pp.__orc_find 0.56 ± 2% -0.1 0.45 ± 3% perf-profile.self.cycles-pp.pipe_read 0.33 ± 4% -0.1 0.22 perf-profile.self.cycles-pp.prepare_to_wait_event 0.48 ± 3% -0.1 0.38 ± 3% perf-profile.self.cycles-pp.__switch_to_asm 0.32 ± 2% -0.1 0.22 ± 7% perf-profile.self.cycles-pp.ktime_get 0.47 -0.1 0.38 ± 2% perf-profile.self.cycles-pp.__switch_to 0.35 ± 2% -0.1 0.26 ± 5% perf-profile.self.cycles-pp.select_task_rq_fair 0.28 ± 5% -0.1 0.20 ± 3% perf-profile.self.cycles-pp.dequeue_entity 0.23 ± 6% -0.1 0.15 ± 3% perf-profile.self.cycles-pp.stack_trace_consume_entry_nosched 0.46 ± 3% -0.1 0.39 ± 5% perf-profile.self.cycles-pp.mutex_lock 0.32 ± 4% -0.1 0.25 ± 4% perf-profile.self.cycles-pp.__update_load_avg_se 0.39 ± 3% -0.1 0.32 ± 6% perf-profile.self.cycles-pp.copy_user_generic_unrolled 0.45 ± 3% -0.1 0.38 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore 0.19 ± 6% -0.1 0.12 ± 8% perf-profile.self.cycles-pp.vfs_read 0.34 ± 4% -0.1 0.27 ± 2% perf-profile.self.cycles-pp.pick_next_entity 0.84 ± 2% -0.1 0.77 perf-profile.self.cycles-pp.enqueue_entity 0.28 ± 5% -0.1 0.21 ± 5% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.19 ± 4% -0.1 0.12 ± 10% perf-profile.self.cycles-pp.__might_sleep 0.35 ± 3% -0.1 0.29 perf-profile.self.cycles-pp.__wake_up_common 0.19 ± 4% -0.1 0.12 ± 6% perf-profile.self.cycles-pp.___perf_sw_event 0.47 ± 2% -0.1 0.41 ± 2% perf-profile.self.cycles-pp.do_idle 0.27 ± 4% -0.1 0.21 ± 3% perf-profile.self.cycles-pp.__unwind_start 0.22 ± 6% -0.1 0.16 ± 2% perf-profile.self.cycles-pp.finish_task_switch 0.34 ± 3% -0.1 0.29 perf-profile.self.cycles-pp.schedule 0.35 ± 6% -0.1 0.29 ± 2% perf-profile.self.cycles-pp.update_cfs_group 0.24 ± 6% -0.1 0.19 ± 4% perf-profile.self.cycles-pp.orc_find 0.21 ± 5% -0.1 0.16 ± 7% perf-profile.self.cycles-pp.cpus_share_cache 0.30 ± 7% -0.1 0.25 ± 5% perf-profile.self.cycles-pp.nr_iowait_cpu 0.18 ± 4% -0.1 0.13 perf-profile.self.cycles-pp.resched_curr 0.29 ± 3% -0.1 0.24 ± 2% perf-profile.self.cycles-pp.mutex_unlock 0.16 ± 9% -0.1 0.11 ± 6% perf-profile.self.cycles-pp.place_entity 0.32 ± 5% -0.0 0.27 perf-profile.self.cycles-pp.cpuidle_enter_state 0.23 ± 3% -0.0 0.18 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.22 ± 4% -0.0 0.18 ± 4% perf-profile.self.cycles-pp.common_file_perm 0.12 ± 3% -0.0 0.08 ± 11% perf-profile.self.cycles-pp.in_sched_functions 0.28 ± 3% -0.0 0.24 ± 3% perf-profile.self.cycles-pp.native_sched_clock 0.20 ± 4% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.__list_del_entry_valid 0.12 ± 5% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.new_sync_write 0.25 -0.0 0.21 perf-profile.self.cycles-pp.___might_sleep 0.20 ± 4% -0.0 0.16 ± 5% perf-profile.self.cycles-pp.vfs_write 0.07 ± 7% -0.0 0.03 ±100% perf-profile.self.cycles-pp.main 0.29 ± 2% -0.0 0.25 ± 4% perf-profile.self.cycles-pp.switch_mm_irqs_off 0.21 ± 2% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.read_tsc 0.07 ± 5% -0.0 0.04 ± 58% perf-profile.self.cycles-pp.rcu_dynticks_eqs_exit 0.12 ± 6% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.new_sync_read 0.21 ± 2% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.12 ± 6% -0.0 0.10 ± 9% perf-profile.self.cycles-pp.arch_stack_walk 0.07 ± 6% -0.0 0.04 ± 57% perf-profile.self.cycles-pp.update_min_vruntime 0.11 ± 4% -0.0 0.08 ± 10% perf-profile.self.cycles-pp.kernel_text_address 0.23 ± 7% -0.0 0.21 ± 5% perf-profile.self.cycles-pp.__account_scheduler_latency 0.14 ± 6% -0.0 0.11 ± 11% perf-profile.self.cycles-pp.__wrgsbase_inactive 0.09 ± 9% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.__entry_text_start 0.08 ± 5% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.copy_page_to_iter 0.19 ± 6% -0.0 0.17 ± 5% perf-profile.self.cycles-pp.pipe_write 0.15 ± 3% -0.0 0.13 ± 5% perf-profile.self.cycles-pp.__fget_light 0.06 ± 6% -0.0 0.04 ± 57% perf-profile.self.cycles-pp.unwind_get_return_address 0.14 ± 7% -0.0 0.12 ± 7% perf-profile.self.cycles-pp.switch_fpu_return 0.09 ± 9% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.tick_nohz_next_event 0.08 ± 11% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.__hrtimer_next_event_base 0.16 -0.0 0.14 ± 6% perf-profile.self.cycles-pp.pick_next_task_fair 0.09 ± 9% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.__rdgsbase_inactive 0.06 -0.0 0.04 ± 57% perf-profile.self.cycles-pp.copy_page_from_iter 0.14 ± 6% -0.0 0.11 ± 7% perf-profile.self.cycles-pp.available_idle_cpu 0.08 ± 16% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.call_cpuidle 0.10 ± 8% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.09 ± 5% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.rcu_idle_exit 0.19 ± 3% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.dequeue_task_fair 0.10 ± 4% -0.0 0.08 perf-profile.self.cycles-pp.__calc_delta 0.17 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.anon_pipe_buf_release 0.17 ± 4% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.copy_fpregs_to_fpstate 0.06 ± 6% -0.0 0.05 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string 0.06 ± 6% -0.0 0.05 perf-profile.self.cycles-pp.put_task_stack 36.91 +2.9 39.86 perf-profile.self.cycles-pp.intel_idle 29.30 ± 2% +3.9 33.15 perf-profile.self.cycles-pp.__cna_queued_spin_lock_slowpath unixbench.time.voluntary_context_switches 4.4e+08 +-----------------------------------------------------------------+ | +.. +..+. ..| 4.3e+08 |-+ : + + | 4.2e+08 |-+ : + | | : | 4.1e+08 |-+ : | 4e+08 |-+ +. .+.. .+..+.+..+. : | | .. +..+.+..+.+. + +..+ | 3.9e+08 |..+.+..+.+..+.+ | 3.8e+08 |-+ | | | 3.7e+08 |-+ O O O O O | 3.6e+08 |-+ O O O O O O O O O O O | | O O O O O O | 3.5e+08 +-----------------------------------------------------------------+ unixbench.score 3800 +--------------------------------------------------------------------+ | .+. .| 3700 |-+ +. .+. +. | 3600 |-+ : +. | | : | 3500 |-+ : | 3400 |-+ .+.+..+..+. : | | .+.+..+..+.+..+..+.+. +..+ | 3300 |..+.+..+..+.+..+. | 3200 |-+ | | | 3100 |-+ O O O O O O O O O O O O O O | 3000 |-+O O O O O O | | O O O O O | 2900 +--------------------------------------------------------------------+ unixbench.workload 6e+08 +-----------------------------------------------------------------+ | | 5.8e+08 |-+ +.. +..+. ..| | : + + | 5.6e+08 |-+ : + | | : | 5.4e+08 |-+ : | | .+. .+.. .+..+.+..+.+..+. : | 5.2e+08 |.. .+.. .+.+. +..+ +.+. +..+ | | + +.+. | 5e+08 |-+ | | O O O O O | 4.8e+08 |-+ O O O O O O O O O O O | | O O O O O O O O | 4.6e+08 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Oliver Sang
>From 7da41e2aab3b53579a762767314ac54a469eb52f Mon Sep 17 00:00:00 2001 From: Alex Kogan <alex.kogan@xxxxxxxxxx> Date: Wed, 25 Nov 2020 13:51:08 -0500 Subject: [PATCH v12 6/6] locking/qspinlock: Introduce the shuffle reduction optimization into CNA This performance optimization chooses probabilistically to avoid moving threads from the main queue into the secondary one when the secondary queue is empty. It is helpful when the lock is only lightly contended. In particular, it makes CNA less eager to create a secondary queue, but does not introduce any extra delays for threads waiting in that queue once it is created. Signed-off-by: Alex Kogan <alex.kogan@xxxxxxxxxx> --- kernel/locking/qspinlock_cna.h | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h index ac3109a..6213992 100644 --- a/kernel/locking/qspinlock_cna.h +++ b/kernel/locking/qspinlock_cna.h @@ -5,6 +5,7 @@ #include <linux/topology.h> #include <linux/sched/rt.h> +#include <linux/random.h> /* * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock). @@ -86,6 +87,34 @@ static inline bool intra_node_threshold_reached(struct cna_node *cn) return current_time - threshold > 0; } +/* + * Controls the probability for enabling the ordering of the main queue + * when the secondary queue is empty. The chosen value reduces the amount + * of unnecessary shuffling of threads between the two waiting queues + * when the contention is low, while responding fast enough and enabling + * the shuffling when the contention is high. + */ +#define SHUFFLE_REDUCTION_PROB_ARG (7) + +/* Per-CPU pseudo-random number seed */ +static DEFINE_PER_CPU(u32, seed); + +/* + * Return false with probability 1 / 2^@num_bits. + * Intuitively, the larger @num_bits the less likely false is to be returned. + * @num_bits must be a number between 0 and 31. + */ +static bool probably(unsigned int num_bits) +{ + u32 s; + + s = this_cpu_read(seed); + s = next_pseudo_random32(s); + this_cpu_write(seed, s); + + return s & ((1 << num_bits) - 1); +} + static void __init cna_init_nodes_per_cpu(unsigned int cpu) { struct mcs_spinlock *base = per_cpu_ptr(&qnodes[0].mcs, cpu); @@ -290,7 +319,15 @@ static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock, { struct cna_node *cn = (struct cna_node *)node; - if (!cn->start_time || !intra_node_threshold_reached(cn)) { + if (node->locked <= 1 && probably(SHUFFLE_REDUCTION_PROB_ARG)) { + /* + * When the secondary queue is empty, skip the call to + * cna_order_queue() with high probability. This optimization + * reduces the overhead of unnecessary shuffling of threads + * between waiting queues when the lock is only lightly contended. + */ + cn->partial_order = LOCAL_WAITER_FOUND; + } else if (!cn->start_time || !intra_node_threshold_reached(cn)) { /* * We are at the head of the wait queue, no need to use * the fake NUMA node ID. -- 2.7.4