Hi maintainers, This issue is debugged on Huawei Kunpeng 920 which is an ARM64 platform and we also do more tests on x86 platform. Since Rong has also reported the improvement on x86,it seems necessary for us to do it. Any comments on it? Thanks, Shaokun 在 2020/7/8 15:23, kernel test robot 写道: > Greeting, > > FYI, we noticed a 32.3% improvement of unixbench.score due to commit: > > > commit: 936e92b615e212d08eb74951324bef25ba564c34 ("[PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode") > url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Move-f_count-to-different-cacheline-with-f_mode/20200624-163511 > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e857ce6eae7ca21b2055cca4885545e29228fe2 > > in testcase: unixbench > on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory > with following parameters: > > runtime: 300s > nr_task: 30% > test: syscall > cpufreq_governor: performance > ucode: 0x5002f01 > > test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system. > test-url: https://github.com/kdlucas/byte-unixbench > > > > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode: > gcc-9/performance/x86_64-rhel-7.6/30%/debian-x86_64-20191114.cgz/300s/lkp-csl-2ap3/syscall/unixbench/0x5002f01 > > commit: > 5e857ce6ea ("Merge branch 'hch' (maccess patches from Christoph Hellwig)") > 936e92b615 ("fs: Move @f_count to different cacheline with @f_mode") > > 5e857ce6eae7ca21 936e92b615e212d08eb74951324 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 2297 ± 2% +32.3% 3038 unixbench.score > 171.74 +34.8% 231.55 unixbench.time.user_time > 1.366e+09 +32.6% 1.812e+09 unixbench.workload > 26472 ± 6% +1270.0% 362665 ±158% cpuidle.C1.usage > 0.25 ± 2% +0.1 0.33 mpstat.cpu.all.usr% > 8.32 ± 43% +129.7% 19.12 ± 63% sched_debug.cpu.clock.stddev > 8.32 ± 43% +129.7% 19.12 ± 63% sched_debug.cpu.clock_task.stddev > 2100 ± 2% -15.6% 1772 ± 9% sched_debug.cpu.nr_switches.min > 373.34 ± 3% +12.4% 419.48 ± 6% sched_debug.cpu.ttwu_local.stddev > 2740 ± 12% -72.3% 757.75 ±105% numa-vmstat.node0.nr_inactive_anon > 3139 ± 8% -69.9% 946.25 ± 97% numa-vmstat.node0.nr_shmem > 2740 ± 12% -72.3% 757.75 ±105% numa-vmstat.node0.nr_zone_inactive_anon > 373.75 ± 51% +443.3% 2030 ± 26% numa-vmstat.node2.nr_inactive_anon > 496.00 ± 19% +366.1% 2311 ± 29% numa-vmstat.node2.nr_shmem > 373.75 ± 51% +443.3% 2030 ± 26% numa-vmstat.node2.nr_zone_inactive_anon > 13728 ± 13% +148.1% 34056 ± 46% numa-vmstat.node3.nr_active_anon > 78558 +11.3% 87431 ± 6% numa-vmstat.node3.nr_file_pages > 9939 ± 8% +19.7% 11902 ± 13% numa-vmstat.node3.nr_shmem > 13728 ± 13% +148.1% 34056 ± 46% numa-vmstat.node3.nr_zone_active_anon > 11103 ± 13% -71.2% 3201 ± 99% numa-meminfo.node0.Inactive > 10962 ± 12% -72.3% 3032 ±105% numa-meminfo.node0.Inactive(anon) > 8551 ± 31% -29.4% 6034 ± 18% numa-meminfo.node0.Mapped > 12560 ± 8% -69.9% 3786 ± 97% numa-meminfo.node0.Shmem > 1596 ± 51% +415.6% 8230 ± 24% numa-meminfo.node2.Inactive > 1496 ± 51% +442.8% 8122 ± 26% numa-meminfo.node2.Inactive(anon) > 1984 ± 19% +366.1% 9248 ± 29% numa-meminfo.node2.Shmem > 54929 ± 13% +148.0% 136212 ± 46% numa-meminfo.node3.Active > 54929 ± 13% +148.0% 136206 ± 46% numa-meminfo.node3.Active(anon) > 314216 +11.3% 349697 ± 6% numa-meminfo.node3.FilePages > 747907 ± 2% +15.2% 861672 ± 9% numa-meminfo.node3.MemUsed > 39744 ± 8% +19.7% 47580 ± 13% numa-meminfo.node3.Shmem > 13.94 ± 6% -13.9 0.00 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.00 +0.7 0.66 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_umask.do_syscall_64.entry_SYSCALL_64_after_hwframe > 31.64 ± 8% +3.4 35.08 ± 5% perf-profile.calltrace.cycles-pp.__fget_files.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe > 6.82 ± 8% +5.6 12.41 ± 12% perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe > 23.54 ± 58% +12.7 36.27 ± 5% perf-profile.calltrace.cycles-pp.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe > 23.54 ± 58% +12.7 36.29 ± 5% perf-profile.calltrace.cycles-pp.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe > 13.98 ± 6% -14.0 0.00 perf-profile.children.cycles-pp.dnotify_flush > 39.81 ± 6% -10.8 28.96 ± 9% perf-profile.children.cycles-pp.filp_close > 40.13 ± 6% -10.7 29.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_close > 0.15 ± 10% -0.0 0.13 ± 8% perf-profile.children.cycles-pp.scheduler_tick > 0.05 ± 8% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.__x64_sys_getuid > 0.10 ± 7% +0.0 0.12 ± 8% perf-profile.children.cycles-pp.__prepare_exit_to_usermode > 0.44 ± 7% +0.1 0.56 ± 6% perf-profile.children.cycles-pp.syscall_return_via_sysret > 31.78 ± 8% +3.4 35.22 ± 5% perf-profile.children.cycles-pp.__fget_files > 32.52 ± 8% +3.7 36.27 ± 5% perf-profile.children.cycles-pp.ksys_dup > 32.54 ± 8% +3.8 36.30 ± 5% perf-profile.children.cycles-pp.__x64_sys_dup > 6.86 ± 7% +5.6 12.45 ± 12% perf-profile.children.cycles-pp.fput_many > 13.91 ± 6% -13.9 0.00 perf-profile.self.cycles-pp.dnotify_flush > 18.05 ± 5% -1.6 16.41 ± 7% perf-profile.self.cycles-pp.filp_close > 0.06 ± 6% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.__prepare_exit_to_usermode > 0.09 ± 9% +0.0 0.11 ± 7% perf-profile.self.cycles-pp.do_syscall_64 > 0.16 ± 9% +0.0 0.20 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > 0.30 ± 8% +0.1 0.36 ± 7% perf-profile.self.cycles-pp.entry_SYSCALL_64 > 0.44 ± 7% +0.1 0.56 ± 6% perf-profile.self.cycles-pp.syscall_return_via_sysret > 31.61 ± 8% +3.4 35.00 ± 5% perf-profile.self.cycles-pp.__fget_files > 6.81 ± 7% +5.6 12.38 ± 12% perf-profile.self.cycles-pp.fput_many > 36623 ± 3% +11.5% 40822 ± 7% softirqs.CPU100.SCHED > 16499 ± 40% +27.8% 21088 ± 35% softirqs.CPU122.RCU > 16758 ± 41% +30.0% 21781 ± 35% softirqs.CPU126.RCU > 178.25 ± 11% +7718.2% 13936 ±168% softirqs.CPU13.NET_RX > 40883 ± 4% -6.9% 38055 ± 2% softirqs.CPU132.SCHED > 16029 ± 41% +35.9% 21789 ± 33% softirqs.CPU144.RCU > 16220 ± 43% +32.4% 21484 ± 35% softirqs.CPU145.RCU > 16393 ± 39% +29.9% 21301 ± 32% softirqs.CPU146.RCU > 16217 ± 39% +29.8% 21055 ± 35% softirqs.CPU147.RCU > 37011 ± 12% +12.4% 41589 ± 5% softirqs.CPU149.SCHED > 16127 ± 41% +34.5% 21685 ± 34% softirqs.CPU150.RCU > 16131 ± 41% +32.3% 21333 ± 35% softirqs.CPU151.RCU > 16558 ± 37% +28.2% 21230 ± 34% softirqs.CPU152.RCU > 15863 ± 40% +34.1% 21266 ± 32% softirqs.CPU153.RCU > 16044 ± 41% +32.7% 21286 ± 34% softirqs.CPU154.RCU > 16057 ± 40% +34.9% 21658 ± 33% softirqs.CPU155.RCU > 16352 ± 39% +31.0% 21423 ± 33% softirqs.CPU156.RCU > 16006 ± 39% +33.4% 21348 ± 32% softirqs.CPU158.RCU > 16300 ± 41% +32.0% 21521 ± 34% softirqs.CPU161.RCU > 37546 ± 4% +13.5% 42605 ± 3% softirqs.CPU161.SCHED > 16411 ± 41% +33.4% 21894 ± 33% softirqs.CPU162.RCU > 16329 ± 41% +32.9% 21704 ± 35% softirqs.CPU163.RCU > 16517 ± 39% +29.8% 21441 ± 34% softirqs.CPU164.RCU > 16227 ± 41% +32.3% 21471 ± 34% softirqs.CPU165.RCU > 16347 ± 40% +31.4% 21481 ± 35% softirqs.CPU166.RCU > 16360 ± 43% +32.2% 21631 ± 35% softirqs.CPU167.RCU > 36986 +11.3% 41148 ± 6% softirqs.CPU167.SCHED > 16218 ± 44% +34.7% 21843 ± 33% softirqs.CPU189.RCU > 16501 ± 39% +32.0% 21783 ± 33% softirqs.CPU52.RCU > 17101 ± 41% +29.4% 22121 ± 35% softirqs.CPU68.RCU > 1.087e+09 +20.9% 1.314e+09 perf-stat.i.branch-instructions > 19778787 +22.1% 24144895 ± 16% perf-stat.i.branch-misses > 22.88 -17.7% 18.84 ± 2% perf-stat.i.cpi > 1.635e+09 +23.6% 2.021e+09 perf-stat.i.dTLB-loads > 20648 ± 2% +218.4% 65736 ±110% perf-stat.i.dTLB-store-misses > 1.023e+09 +24.8% 1.276e+09 perf-stat.i.dTLB-stores > 78.10 +1.4 79.54 perf-stat.i.iTLB-load-miss-rate% > 16169669 +8.2% 17493234 perf-stat.i.iTLB-load-misses > 5.364e+09 +21.3% 6.507e+09 perf-stat.i.instructions > 369.33 +11.8% 413.03 ± 5% perf-stat.i.instructions-per-iTLB-miss > 0.41 ± 2% +83.3% 0.76 ± 16% perf-stat.i.metric.K/sec > 19.79 +23.2% 24.39 perf-stat.i.metric.M/sec > 4460149 ± 2% -45.1% 2447884 ± 14% perf-stat.i.node-load-misses > 241219 ± 2% -58.8% 99443 ± 47% perf-stat.i.node-loads > 1679821 ± 2% -4.4% 1605611 ± 3% perf-stat.i.node-store-misses > 25.91 -17.6% 21.36 perf-stat.overall.cpi > 82.51 +1.7 84.17 perf-stat.overall.iTLB-load-miss-rate% > 331.21 +12.2% 371.62 perf-stat.overall.instructions-per-iTLB-miss > 0.04 +21.3% 0.05 perf-stat.overall.ipc > 1566 -8.4% 1435 perf-stat.overall.path-length > 1.089e+09 +21.0% 1.318e+09 perf-stat.ps.branch-instructions > 19801099 +21.7% 24102537 ± 15% perf-stat.ps.branch-misses > 1.641e+09 +23.6% 2.028e+09 perf-stat.ps.dTLB-loads > 20512 ± 2% +212.7% 64142 ±109% perf-stat.ps.dTLB-store-misses > 1.027e+09 +24.8% 1.282e+09 perf-stat.ps.dTLB-stores > 16239916 +8.2% 17567773 perf-stat.ps.iTLB-load-misses > 5.378e+09 +21.4% 6.527e+09 perf-stat.ps.instructions > 4485062 ± 2% -45.2% 2458026 ± 14% perf-stat.ps.node-load-misses > 242388 ± 2% -59.0% 99493 ± 47% perf-stat.ps.node-loads > 1689890 ± 2% -4.5% 1614182 ± 3% perf-stat.ps.node-store-misses > 2.139e+12 +21.5% 2.6e+12 perf-stat.total.instructions > 288.00 ± 13% +8910.9% 25951 ±168% interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3 > 2042 ± 57% +190.2% 5927 ± 26% interrupts.CPU1.NMI:Non-maskable_interrupts > 2042 ± 57% +190.2% 5927 ± 26% interrupts.CPU1.PMI:Performance_monitoring_interrupts > 3.75 ± 34% +2373.3% 92.75 ±130% interrupts.CPU100.TLB:TLB_shootdowns > 3510 ± 88% -85.1% 522.00 ±124% interrupts.CPU107.NMI:Non-maskable_interrupts > 3510 ± 88% -85.1% 522.00 ±124% interrupts.CPU107.PMI:Performance_monitoring_interrupts > 3813 ± 74% -73.3% 1018 ±150% interrupts.CPU110.NMI:Non-maskable_interrupts > 3813 ± 74% -73.3% 1018 ±150% interrupts.CPU110.PMI:Performance_monitoring_interrupts > 4536 ± 51% -97.1% 131.50 ± 8% interrupts.CPU111.NMI:Non-maskable_interrupts > 4536 ± 51% -97.1% 131.50 ± 8% interrupts.CPU111.PMI:Performance_monitoring_interrupts > 4476 ± 47% -97.5% 113.00 ± 19% interrupts.CPU112.NMI:Non-maskable_interrupts > 4476 ± 47% -97.5% 113.00 ± 19% interrupts.CPU112.PMI:Performance_monitoring_interrupts > 3522 ± 36% +92.7% 6787 ± 16% interrupts.CPU120.NMI:Non-maskable_interrupts > 3522 ± 36% +92.7% 6787 ± 16% interrupts.CPU120.PMI:Performance_monitoring_interrupts > 2888 ± 66% +117.5% 6283 ± 21% interrupts.CPU123.NMI:Non-maskable_interrupts > 2888 ± 66% +117.5% 6283 ± 21% interrupts.CPU123.PMI:Performance_monitoring_interrupts > 3109 ± 61% +132.5% 7230 ± 7% interrupts.CPU124.NMI:Non-maskable_interrupts > 3109 ± 61% +132.5% 7230 ± 7% interrupts.CPU124.PMI:Performance_monitoring_interrupts > 1067 ± 19% -21.6% 836.50 interrupts.CPU125.CAL:Function_call_interrupts > 288.00 ± 13% +8910.9% 25951 ±168% interrupts.CPU13.34:PCI-MSI.524292-edge.eth0-TxRx-3 > 244.25 ± 96% -95.3% 11.50 ± 95% interrupts.CPU13.TLB:TLB_shootdowns > 2056 ±117% +206.3% 6298 ± 20% interrupts.CPU130.NMI:Non-maskable_interrupts > 2056 ±117% +206.3% 6298 ± 20% interrupts.CPU130.PMI:Performance_monitoring_interrupts > 831.50 +21.4% 1009 ± 13% interrupts.CPU133.CAL:Function_call_interrupts > 8.00 ± 29% +634.4% 58.75 ±119% interrupts.CPU133.RES:Rescheduling_interrupts > 1629 ±159% +265.3% 5952 ± 29% interrupts.CPU139.NMI:Non-maskable_interrupts > 1629 ±159% +265.3% 5952 ± 29% interrupts.CPU139.PMI:Performance_monitoring_interrupts > 1660 ±159% +161.0% 4332 ± 61% interrupts.CPU141.NMI:Non-maskable_interrupts > 1660 ±159% +161.0% 4332 ± 61% interrupts.CPU141.PMI:Performance_monitoring_interrupts > 882.75 ±147% +542.5% 5671 ± 38% interrupts.CPU143.NMI:Non-maskable_interrupts > 882.75 ±147% +542.5% 5671 ± 38% interrupts.CPU143.PMI:Performance_monitoring_interrupts > 2600 ± 29% +68.8% 4389 ± 47% interrupts.CPU144.NMI:Non-maskable_interrupts > 2600 ± 29% +68.8% 4389 ± 47% interrupts.CPU144.PMI:Performance_monitoring_interrupts > 1494 ± 20% +91.3% 2859 ± 29% interrupts.CPU147.NMI:Non-maskable_interrupts > 1494 ± 20% +91.3% 2859 ± 29% interrupts.CPU147.PMI:Performance_monitoring_interrupts > 3657 ± 54% -96.3% 133.75 ± 8% interrupts.CPU15.NMI:Non-maskable_interrupts > 3657 ± 54% -96.3% 133.75 ± 8% interrupts.CPU15.PMI:Performance_monitoring_interrupts > 5165 ± 40% -97.8% 115.00 ± 26% interrupts.CPU16.NMI:Non-maskable_interrupts > 5165 ± 40% -97.8% 115.00 ± 26% interrupts.CPU16.PMI:Performance_monitoring_interrupts > 34.00 ±125% -84.6% 5.25 ± 49% interrupts.CPU186.RES:Rescheduling_interrupts > 1033 ± 24% -19.0% 836.75 interrupts.CPU190.CAL:Function_call_interrupts > 68.00 ± 28% +55.5% 105.75 ± 9% interrupts.CPU26.RES:Rescheduling_interrupts > 882.25 ± 4% +6.3% 937.75 ± 7% interrupts.CPU32.CAL:Function_call_interrupts > 139.25 ± 96% -74.0% 36.25 ± 72% interrupts.CPU32.TLB:TLB_shootdowns > 848.25 ±130% +368.9% 3977 ± 56% interrupts.CPU35.NMI:Non-maskable_interrupts > 848.25 ±130% +368.9% 3977 ± 56% interrupts.CPU35.PMI:Performance_monitoring_interrupts > 958.25 ± 11% -10.6% 856.75 interrupts.CPU36.CAL:Function_call_interrupts > 1903 ± 72% +127.9% 4337 ± 23% interrupts.CPU41.NMI:Non-maskable_interrupts > 1903 ± 72% +127.9% 4337 ± 23% interrupts.CPU41.PMI:Performance_monitoring_interrupts > 1320 ±158% +245.4% 4560 ± 32% interrupts.CPU47.NMI:Non-maskable_interrupts > 1320 ±158% +245.4% 4560 ± 32% interrupts.CPU47.PMI:Performance_monitoring_interrupts > 837.50 +5.2% 881.25 ± 4% interrupts.CPU61.CAL:Function_call_interrupts > 1074 ± 28% -22.1% 836.50 interrupts.CPU69.CAL:Function_call_interrupts > 1042 ± 12% -18.7% 847.50 ± 2% interrupts.CPU86.CAL:Function_call_interrupts > > > > unixbench.score > > 3200 +--------------------------------------------------------------------+ > | O O O | > 3000 |-+ O O O O O O O O O | > | O O O O | > | O | > 2800 |-+ | > | | > 2600 |-+ | > | | > 2400 |-+ | > | +.+.. .+.+..+. +..+. .+. .+. .+..+.+.+..+.+.+. .+.| > |.+.. + .+ +.+..+. + + +. + +. | > 2200 |-+ + + + | > | | > 2000 +--------------------------------------------------------------------+ > > > unixbench.workload > > 1.9e+09 +-----------------------------------------------------------------+ > | O O O O | > 1.8e+09 |-+ O O O O O O O O | > | O O O O O | > 1.7e+09 |-+ | > | | > 1.6e+09 |-+ | > | | > 1.5e+09 |-+ | > | | > 1.4e+09 |-+ +.+ .+..+.+ +.+. .+.. .+. .+..+. .+. .+.. .| > |.+. .. : + + .+.+.. + + + +.+ + + +.+ | > 1.3e+09 |-+ + : + + + | > | + | > 1.2e+09 +-----------------------------------------------------------------+ > > > [*] bisect-good sample > [O] bisect-bad sample > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > Thanks, > Rong Chen >