On Mon, Dec 05, 2022 at 12:43:37PM -0800, Linus Torvalds wrote: > On Mon, Dec 5, 2022 at 1:02 AM kernel test robot <yujie.liu@xxxxxxxxx> wrote: > > > > FYI, we noticed a -53.3% regression of will-it-scale.per_thread_ops due to commit: > > 5df397dec7c4 ("mm: delay page_remove_rmap() until after the TLB has been flushed") > > Sadly, I think this may be at least partially expected. > > The code fundamentally moves one "loop over pages" and splits it up > (with the TLB flush in between). > > Which can't be great for locality, but it's kind of fundamental for > the fix - but some of it might be due to the batch limit logic. > > I wouldn't have expected it to actually show up in any real loads, but: > > > in testcase: will-it-scale > > test: page_fault3 > > I assume that this test is doing a lot of mmap/munmap on dirty shared > memory regions (both because of the regression, and because of the > name of that test ;) > > So this is hopefully an extreme case. > > Now, it's likely that this particular case probably also triggers that > > /* No more batching if we have delayed rmaps pending */ > > which means that the loops in between the TLB flushes will be smaller, > since we don't batch up as many pages as we used to before we force a > TLB (and rmap) flush and free them. > > If it's due to that batching issue it may be fixable - I'll think > about this some more, but > > > Details are as below: > > The bug it fixes ends up meaning that we run that rmap removal code > _after_ the TLB flush, and it looks like this (probably combined with > the batching limit) then causes some nasty iTLB load issues: > > > 2291312 ą 2% +1452.8% 35580378 ą 4% perf-stat.i.iTLB-loads > > although it also does look like it's at least partly due to some irq > timing issue (and/or bad NUMA/CPU migration luck): > > > 388169 +267.4% 1426305 ą 6% vmstat.system.in > > 161.37 +84.9% 298.43 ą 6% perf-stat.ps.cpu-migrations > > 172442 ą 4% +26.9% 218745 ą 8% perf-stat.ps.node-load-misses > > so it might be that some of the regression comes down to "bad luck" - > it happened to run really nicely on that particular machine, and then > the timing changes caused some random "phase change" to the load. > > The profile doesn't actually seem to show all that much more IPI > overhead, so maybe these incidental issues are what then causes that > big regression. > > It would be lovely to hear if you see this on other machines and/or loads. FYI, we ran this "will-it-scale page_fault3" testcase on two other x86 platforms and observed similar performance regressions. We haven't seen regressions from other benchmarks/workloads yet. 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp7/page_fault3/will-it-scale commit: 7cc8f9c7146a5 ("mm: mmu_gather: prepare to gather encoded page pointers with flags") 5df397dec7c4c ("mm: delay page_remove_rmap() until after the TLB has been flushed") 7cc8f9c7146a5 5df397dec7c4c ---------------- ------------ -------------- %stddev %change %stddev \ | \ 5292018 -41.6% 3090618 ± 2% will-it-scale.16.threads 84.04 +5.5% 88.64 will-it-scale.16.threads_idle 330750 -41.6% 193163 ± 2% will-it-scale.per_thread_ops 5292018 -41.6% 3090618 ± 2% will-it-scale.workload 3777076 -33.9% 2496224 numa-numastat.node0.local_node 3834886 -33.7% 2541691 numa-numastat.node0.numa_hit 1.17 ± 9% +1.2 2.39 ± 8% mpstat.cpu.all.irq% 13.50 ± 2% -5.3 8.17 mpstat.cpu.all.sys% 1.14 ± 39% -0.5 0.64 mpstat.cpu.all.usr% 83.83 +6.0% 88.83 vmstat.cpu.id 13.83 ± 2% -32.5% 9.33 ± 5% vmstat.procs.r. 9325 ± 3% -46.2% 5018 vmstat.system.cs 298875 +422.0% 1560096 vmstat.system.in 160279 ± 23% +36.5% 218776 ± 12% numa-meminfo.node0.AnonPages 166313 ± 21% +34.5% 223688 ± 11% numa-meminfo.node0.Inactive 164286 ± 22% +35.9% 223228 ± 11% numa-meminfo.node0.Inactive(anon) 4048 ± 6% +14.1% 4620 ± 5% numa-meminfo.node0.PageTables 247964 ± 16% -28.7% 176690 ± 17% numa-meminfo.node1.AnonPages.max 40074 ± 23% +36.5% 54693 ± 12% numa-vmstat.node0.nr_anon_pages 41076 ± 22% +35.9% 55806 ± 11% numa-vmstat.node0.nr_inactive_anon 1012 ± 6% +14.0% 1154 ± 5% numa-vmstat.node0.nr_page_table_pages 41076 ± 22% +35.9% 55806 ± 11% numa-vmstat.node0.nr_zone_inactive_anon 3834883 -33.7% 2541696 numa-vmstat.node0.numa_hit 3777072 -33.9% 2496229 numa-vmstat.node0.numa_local 442.00 -29.3% 312.67 ± 2% turbostat.Avg_MHz 16.87 ± 2% -4.5 12.33 ± 4% turbostat.Busy% 611287 ± 13% -91.1% 54248 ± 13% turbostat.C1 0.27 ± 16% -0.3 0.01 turbostat.C1% 1.238e+08 +624.2% 8.965e+08 turbostat.IRQ 167.40 -6.5% 156.53 turbostat.PkgWatt 270220 +1.8% 275170 proc-vmstat.nr_mapped 4434671 -29.9% 3110296 proc-vmstat.numa_hit 4348904 -30.5% 3023405 proc-vmstat.numa_local 548152 -1.4% 540422 proc-vmstat.pgactivate 4512817 -29.2% 3193199 proc-vmstat.pgalloc_normal 1.594e+09 -41.6% 9.308e+08 ± 2% proc-vmstat.pgfault 4490607 -29.4% 3171990 proc-vmstat.pgfree 0.42 ± 4% -12.7% 0.36 ± 7% sched_debug.cfs_rq:/.h_nr_running.stddev 78690 ± 2% +78.7% 140636 ± 46% sched_debug.cfs_rq:/.load.max 30480 ± 5% +31.9% 40209 ± 15% sched_debug.cfs_rq:/.load.stddev 317962 ± 8% -48.8% 162930 ± 15% sched_debug.cfs_rq:/.min_vruntime.avg 1279285 ± 12% -56.3% 558508 ± 11% sched_debug.cfs_rq:/.min_vruntime.max 404116 ± 10% -57.3% 172730 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev 0.42 ± 5% -12.6% 0.36 ± 7% sched_debug.cfs_rq:/.nr_running.stddev 231.45 ± 7% -23.5% 177.15 ± 6% sched_debug.cfs_rq:/.runnable_avg.avg 854.02 ± 3% -15.8% 719.28 ± 7% sched_debug.cfs_rq:/.runnable_avg.max 315.94 ± 4% -27.2% 229.91 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev 681750 ± 31% -50.1% 340262 ± 13% sched_debug.cfs_rq:/.spread0.max -577443 -66.6% -193080 sched_debug.cfs_rq:/.spread0.min 404120 ± 10% -57.3% 172733 ± 10% sched_debug.cfs_rq:/.spread0.stddev 231.41 ± 7% -23.5% 177.11 ± 6% sched_debug.cfs_rq:/.util_avg.avg 853.96 ± 3% -15.8% 719.22 ± 7% sched_debug.cfs_rq:/.util_avg.max 315.91 ± 4% -27.2% 229.88 ± 2% sched_debug.cfs_rq:/.util_avg.stddev 155.50 ± 9% -49.7% 78.27 ± 26% sched_debug.cfs_rq:/.util_est_enqueued.avg 781.56 ± 2% -22.9% 602.36 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.max 267.82 ± 5% -39.1% 163.05 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.stddev 151897 ± 35% +76.2% 267629 ± 33% sched_debug.cpu.avg_idle.min 215458 ± 12% -41.9% 125119 ± 7% sched_debug.cpu.avg_idle.stddev 1645 ± 9% +85.1% 3044 ± 6% sched_debug.cpu.clock_task.stddev 872.08 ± 4% -28.3% 624.97 ± 12% sched_debug.cpu.curr->pid.avg 1962 ± 2% -14.7% 1674 ± 6% sched_debug.cpu.curr->pid.stddev 0.17 ± 5% -27.9% 0.12 ± 12% sched_debug.cpu.nr_running.avg 0.37 ± 2% -16.5% 0.31 ± 6% sched_debug.cpu.nr_running.stddev 16252 ± 9% -38.7% 9956 ± 7% sched_debug.cpu.nr_switches.avg 18638 ± 12% -43.9% 10451 ± 19% sched_debug.cpu.nr_switches.stddev 3.315e+09 -29.4% 2.34e+09 perf-stat.i.branch-instructions 0.30 ± 21% +0.2 0.50 ± 24% perf-stat.i.branch-miss-rate% 8341916 ± 11% -31.9% 5682680 ± 15% perf-stat.i.cache-misses 9303 ± 3% -46.8% 4949 perf-stat.i.context-switches 4.193e+10 -29.8% 2.944e+10 ± 2% perf-stat.i.cpu-cycles 4.344e+09 -28.1% 3.122e+09 perf-stat.i.dTLB-loads 5.87 -1.0 4.85 perf-stat.i.dTLB-store-miss-rate% 1.477e+08 -41.7% 86041370 ± 2% perf-stat.i.dTLB-store-misses 2.369e+09 -28.9% 1.685e+09 perf-stat.i.dTLB-stores 86.29 -60.2 26.04 perf-stat.i.iTLB-load-miss-rate% 10635919 ± 17% -15.9% 8947125 perf-stat.i.iTLB-load-misses 1651323 ± 6% +1441.1% 25448763 perf-stat.i.iTLB-loads 1.593e+10 -29.2% 1.128e+10 perf-stat.i.instructions 0.44 -29.8% 0.31 ± 2% perf-stat.i.metric.GHz 450.64 ± 78% +335.6% 1963 ± 15% perf-stat.i.metric.K/sec 106.83 -30.1% 74.65 perf-stat.i.metric.M/sec 5278281 -41.7% 3075728 ± 2% perf-stat.i.minor-faults 0.48 ± 14% +0.5 0.94 ± 22% perf-stat.i.node-store-miss-rate% 5329558 -41.5% 3115251 ± 2% perf-stat.i.node-stores 5278281 -41.7% 3075729 ± 2% perf-stat.i.page-faults 0.32 ± 20% +0.2 0.53 ± 22% perf-stat.overall.branch-miss-rate% 5.87 -1.0 4.86 perf-stat.overall.dTLB-store-miss-rate% 86.34 -60.3 26.01 perf-stat.overall.iTLB-load-miss-rate% 0.47 ± 14% +0.3 0.81 ± 23% perf-stat.overall.node-store-miss-rate% 909203 +21.4% 1104227 perf-stat.overall.path-length 3.304e+09 -29.4% 2.333e+09 perf-stat.ps.branch-instructions 8314122 ± 11% -31.9% 5663748 ± 15% perf-stat.ps.cache-misses 9272 ± 3% -46.8% 4933 perf-stat.ps.context-switches 4.179e+10 -29.8% 2.935e+10 ± 2% perf-stat.ps.cpu-cycles 4.33e+09 -28.1% 3.111e+09 perf-stat.ps.dTLB-loads 1.472e+08 -41.8% 85755366 ± 2% perf-stat.ps.dTLB-store-misses 2.361e+09 -28.9% 1.679e+09 perf-stat.ps.dTLB-stores 10601230 ± 17% -15.9% 8917293 perf-stat.ps.iTLB-load-misses 1645797 ± 6% +1441.2% 25364210 perf-stat.ps.iTLB-loads 1.588e+10 -29.2% 1.124e+10 perf-stat.ps.instructions 5260752 -41.7% 3065504 ± 2% perf-stat.ps.minor-faults 5311793 -41.5% 3104889 ± 2% perf-stat.ps.node-stores 5260753 -41.7% 3065504 ± 2% perf-stat.ps.page-faults 4.812e+12 -29.1% 3.412e+12 perf-stat.total.instructions 22.05 ± 8% -4.7 17.34 ± 11% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 11.90 ± 8% -4.0 7.95 ± 9% perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 15.06 ± 8% -2.9 12.11 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 11.63 ± 7% -2.3 9.32 ± 9% perf-profile.calltrace.cycles-pp.down_read_trylock.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.00 +0.6 0.59 ± 11% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 0.00 +0.6 0.62 ± 10% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.testcase 0.00 +0.6 0.62 ± 11% perf-profile.calltrace.cycles-pp.page_remove_rmap.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +0.7 0.66 ± 10% perf-profile.calltrace.cycles-pp.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 0.00 +0.8 0.78 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase 0.00 +0.8 0.85 ± 11% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault 0.00 +0.9 0.85 ± 11% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault 0.00 +0.9 0.89 ± 9% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.down_read_trylock 0.00 +0.9 0.90 ± 9% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault 0.00 +0.9 0.90 ± 11% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +0.9 0.93 ± 7% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.up_read 0.00 +0.9 0.94 ± 6% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.up_read.do_user_addr_fault 0.00 +0.9 0.95 ± 9% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault.exc_page_fault 0.00 +1.0 1.00 ± 6% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.up_read.do_user_addr_fault.exc_page_fault 0.00 +1.0 1.00 ± 36% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 0.00 +1.1 1.05 ± 36% perf-profile.calltrace.cycles-pp.flush_tlb_func.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 0.00 +1.1 1.10 ± 11% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.00 +1.1 1.15 ± 8% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +1.2 1.20 ± 7% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +1.5 1.48 ± 7% perf-profile.calltrace.cycles-pp.__default_send_IPI_dest_field.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 0.00 +1.5 1.52 ± 7% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 0.00 +1.7 1.71 ± 10% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault 0.00 +1.7 1.72 ± 10% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault 0.00 +1.8 1.82 ± 10% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +3.1 3.14 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.00 +3.6 3.61 ± 15% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function 0.00 +3.8 3.85 ± 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range 0.00 +3.9 3.87 ± 7% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +4.2 4.22 ± 20% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.calltrace.cycles-pp.__munmap 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.94 ± 7% +4.3 6.25 ± 7% perf-profile.calltrace.cycles-pp.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.91 ± 7% +4.3 6.23 ± 7% perf-profile.calltrace.cycles-pp.unmap_region.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 1.89 ± 7% +4.3 6.22 ± 7% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap 1.89 ± 7% +4.3 6.22 ± 7% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_mas_align_munmap.__vm_munmap 1.89 ± 7% +4.3 6.22 ± 7% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_mas_align_munmap 1.86 ± 7% +4.3 6.21 ± 7% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 0.00 +4.6 4.56 ± 7% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 30.20 ± 17% +6.8 37.04 ± 15% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 30.64 ± 16% +7.0 37.60 ± 14% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 30.73 ± 16% +7.0 37.70 ± 14% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 30.73 ± 16% +7.0 37.71 ± 14% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 30.73 ± 16% +7.0 37.71 ± 14% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 22.10 ± 8% -4.6 17.48 ± 11% perf-profile.children.cycles-pp.handle_mm_fault 11.92 ± 8% -3.8 8.15 ± 8% perf-profile.children.cycles-pp.up_read 11.65 ± 7% -2.1 9.51 ± 9% perf-profile.children.cycles-pp.down_read_trylock 0.16 ± 21% -0.1 0.09 ± 26% perf-profile.children.cycles-pp.process_simple 0.13 ± 14% -0.1 0.07 ± 10% perf-profile.children.cycles-pp.rwsem_down_read_slowpath 0.14 ± 21% -0.1 0.08 ± 22% perf-profile.children.cycles-pp.queue_event 0.14 ± 21% -0.1 0.08 ± 26% perf-profile.children.cycles-pp.ordered_events__queue 0.14 ± 14% -0.0 0.10 ± 10% perf-profile.children.cycles-pp.__schedule 0.10 ± 10% -0.0 0.06 ± 19% perf-profile.children.cycles-pp.schedule 0.02 ±141% +0.0 0.06 ± 13% perf-profile.children.cycles-pp.ret_from_fork 0.02 ±141% +0.0 0.06 ± 13% perf-profile.children.cycles-pp.kthread 0.16 ± 21% +0.1 0.22 ± 10% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length 0.00 +0.1 0.09 ± 7% perf-profile.children.cycles-pp._find_next_bit 0.08 ± 10% +0.2 0.25 ± 15% perf-profile.children.cycles-pp.native_sched_clock 0.08 ± 8% +0.2 0.29 ± 14% perf-profile.children.cycles-pp.sched_clock_cpu 0.20 ± 12% +0.2 0.43 ± 13% perf-profile.children.cycles-pp.__irq_exit_rcu 0.07 ± 11% +0.2 0.31 ± 10% perf-profile.children.cycles-pp.irqtime_account_irq 1.46 ± 8% +0.3 1.77 ± 10% perf-profile.children.cycles-pp.__filemap_get_folio 1.71 ± 8% +0.3 2.02 ± 10% perf-profile.children.cycles-pp.shmem_get_folio_gfp 0.00 +0.4 0.43 ± 10% perf-profile.children.cycles-pp.llist_reverse_order 0.00 +0.6 0.59 ± 11% perf-profile.children.cycles-pp.llist_add_batch 0.00 +0.7 0.67 ± 10% perf-profile.children.cycles-pp.tlb_flush_rmaps 0.09 ± 15% +1.4 1.48 ± 7% perf-profile.children.cycles-pp.__default_send_IPI_dest_field 0.09 ± 14% +1.4 1.53 ± 7% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys 0.19 ± 8% +3.7 3.87 ± 7% perf-profile.children.cycles-pp.smp_call_function_many_cond 0.19 ± 8% +3.7 3.87 ± 7% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 2.19 ± 7% +4.3 6.49 ± 7% perf-profile.children.cycles-pp.do_syscall_64 2.19 ± 7% +4.3 6.49 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 1.97 ± 7% +4.3 6.27 ± 7% perf-profile.children.cycles-pp.__vm_munmap 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.children.cycles-pp.__x64_sys_munmap 1.96 ± 7% +4.3 6.27 ± 7% perf-profile.children.cycles-pp.__munmap 1.94 ± 7% +4.3 6.25 ± 7% perf-profile.children.cycles-pp.do_mas_align_munmap 1.92 ± 7% +4.3 6.24 ± 7% perf-profile.children.cycles-pp.unmap_region 1.90 ± 7% +4.3 6.23 ± 7% perf-profile.children.cycles-pp.unmap_vmas 1.90 ± 7% +4.3 6.23 ± 7% perf-profile.children.cycles-pp.unmap_page_range 1.90 ± 7% +4.3 6.23 ± 7% perf-profile.children.cycles-pp.zap_pmd_range 1.90 ± 7% +4.3 6.23 ± 7% perf-profile.children.cycles-pp.zap_pte_range 0.19 ± 8% +4.4 4.56 ± 7% perf-profile.children.cycles-pp.flush_tlb_mm_range 0.14 ± 11% +6.5 6.64 ± 9% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.14 ± 11% +6.5 6.69 ± 9% perf-profile.children.cycles-pp.__sysvec_call_function 0.00 +6.8 6.84 ± 9% perf-profile.children.cycles-pp.native_flush_tlb_one_user 0.17 ± 11% +6.9 7.09 ± 9% perf-profile.children.cycles-pp.sysvec_call_function 30.73 ± 16% +7.0 37.71 ± 14% perf-profile.children.cycles-pp.start_secondary 0.08 ± 13% +7.3 7.39 ± 9% perf-profile.children.cycles-pp.flush_tlb_func 0.30 ± 12% +8.7 8.96 ± 8% perf-profile.children.cycles-pp.asm_sysvec_call_function 11.80 ± 8% -4.8 6.96 ± 9% perf-profile.self.cycles-pp.up_read 10.49 ± 7% -4.1 6.38 ± 9% perf-profile.self.cycles-pp.__handle_mm_fault 11.55 ± 7% -3.2 8.38 ± 9% perf-profile.self.cycles-pp.down_read_trylock 6.15 ± 11% -2.4 3.74 ± 23% perf-profile.self.cycles-pp.handle_mm_fault 9.08 ± 7% -1.8 7.24 ± 9% perf-profile.self.cycles-pp.testcase 0.32 ± 8% -0.1 0.23 ± 10% perf-profile.self.cycles-pp.page_remove_rmap 0.14 ± 21% -0.1 0.08 ± 22% perf-profile.self.cycles-pp.queue_event 0.26 ± 9% -0.0 0.21 ± 8% perf-profile.self.cycles-pp.page_add_file_rmap 0.15 ± 7% -0.0 0.12 ± 10% perf-profile.self.cycles-pp.__mod_lruvec_page_state 0.12 ± 9% -0.0 0.09 ± 11% perf-profile.self.cycles-pp.do_fault 0.10 ± 9% -0.0 0.07 ± 17% perf-profile.self.cycles-pp.__count_memcg_events 0.00 +0.1 0.06 ± 15% perf-profile.self.cycles-pp.flush_tlb_mm_range 0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.sysvec_call_function 0.00 +0.1 0.08 ± 12% perf-profile.self.cycles-pp.irqtime_account_irq 0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp._find_next_bit 0.00 +0.1 0.10 ± 9% perf-profile.self.cycles-pp.asm_sysvec_call_function 0.64 ± 8% +0.1 0.77 ± 8% perf-profile.self.cycles-pp.__filemap_get_folio 0.07 ± 12% +0.2 0.24 ± 16% perf-profile.self.cycles-pp.native_sched_clock 0.00 +0.4 0.42 ± 10% perf-profile.self.cycles-pp.llist_reverse_order 0.00 +0.5 0.51 ± 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.42 ± 7% +0.6 0.97 ± 10% perf-profile.self.cycles-pp.do_user_addr_fault 0.03 ±100% +0.6 0.58 ± 4% perf-profile.self.cycles-pp.smp_call_function_many_cond 0.00 +0.6 0.57 ± 9% perf-profile.self.cycles-pp.flush_tlb_func 0.00 +0.6 0.59 ± 10% perf-profile.self.cycles-pp.llist_add_batch 0.09 ± 15% +1.4 1.48 ± 7% perf-profile.self.cycles-pp.__default_send_IPI_dest_field 0.00 +6.8 6.82 ± 9% perf-profile.self.cycles-pp.native_flush_tlb_one_user 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/page_fault3/will-it-scale commit: 7cc8f9c7146a5 ("mm: mmu_gather: prepare to gather encoded page pointers with flags") 5df397dec7c4c ("mm: delay page_remove_rmap() until after the TLB has been flushed") 7cc8f9c7146a5 5df397dec7c4c ---------------- ------------ -------------- %stddev %change %stddev \ | \ 6221506 ± 5% -44.7% 3439127 ± 2% will-it-scale.16.threads 87.20 +4.3% 90.98 will-it-scale.16.threads_idle 388843 ± 5% -44.7% 214944 ± 2% will-it-scale.per_thread_ops 6221506 ± 5% -44.7% 3439127 ± 2% will-it-scale.workload 1.23 ± 8% +1.3 2.54 ± 19% mpstat.cpu.all.irq% 10.86 -4.9 5.96 ± 2% mpstat.cpu.all.sys% 0.61 ± 6% -0.2 0.40 ± 3% mpstat.cpu.all.usr% 4388857 ± 5% -36.6% 2782588 ± 2% numa-numastat.node0.local_node 4446685 ± 4% -36.3% 2833518 ± 2% numa-numastat.node0.numa_hit 618831 ± 3% -10.3% 554830 ± 4% numa-numastat.node1.local_node 14.50 ± 3% -39.1% 8.83 ± 7% vmstat.procs.r 10336 ± 8% -45.7% 5616 vmstat.system.cs 390901 ± 3% +350.1% 1759451 ± 2% vmstat.system.in 410.83 -31.2% 282.50 ± 4% turbostat.Avg_MHz 13.65 -3.6 10.06 ± 8% turbostat.Busy% 1.603e+08 ± 5% +524.0% 1e+09 ± 2% turbostat.IRQ 60.67 -7.1% 56.33 ± 5% turbostat.PkgTmp 274.70 -8.6% 251.09 ± 6% turbostat.PkgWatt 126930 ± 15% -35.2% 82187 ± 25% numa-meminfo.node0.AnonHugePages 248450 ± 9% -19.6% 199671 ± 25% numa-meminfo.node0.AnonPages 255089 ± 9% -19.6% 205148 ± 24% numa-meminfo.node0.Inactive 254353 ± 9% -19.4% 204910 ± 24% numa-meminfo.node0.Inactive(anon) 22546 ± 12% -24.4% 17051 ± 9% numa-meminfo.node1.Active 22116 ± 11% -25.2% 16539 ± 10% numa-meminfo.node1.Active(anon) 24956 ± 10% -24.9% 18736 ± 9% numa-meminfo.node1.Shmem 264468 +3.9% 274871 proc-vmstat.nr_mapped 5125804 ± 4% -32.6% 3455449 proc-vmstat.numa_hit 5010073 ± 4% -33.3% 3339738 proc-vmstat.numa_local 551502 -1.7% 542068 proc-vmstat.pgactivate 5213112 ± 4% -32.1% 3539426 proc-vmstat.pgalloc_normal 1.874e+09 ± 5% -44.7% 1.036e+09 ± 2% proc-vmstat.pgfault 5251524 ± 4% -31.8% 3580764 proc-vmstat.pgfree 62112 ± 9% -19.6% 49917 ± 25% numa-vmstat.node0.nr_anon_pages 63588 ± 9% -19.4% 51227 ± 24% numa-vmstat.node0.nr_inactive_anon 63588 ± 9% -19.4% 51227 ± 24% numa-vmstat.node0.nr_zone_inactive_anon 4446807 ± 4% -36.3% 2833561 ± 2% numa-vmstat.node0.numa_hit 4388978 ± 5% -36.6% 2782630 ± 2% numa-vmstat.node0.numa_local 5529 ± 11% -25.2% 4134 ± 10% numa-vmstat.node1.nr_active_anon 6238 ± 10% -24.9% 4684 ± 9% numa-vmstat.node1.nr_shmem 5529 ± 11% -25.2% 4134 ± 10% numa-vmstat.node1.nr_zone_active_anon 618919 ± 3% -10.4% 554861 ± 4% numa-vmstat.node1.numa_local 0.30 ± 12% -57.3% 0.13 ± 16% sched_debug.cfs_rq:/.h_nr_running.avg 0.42 ± 3% -24.8% 0.32 ± 7% sched_debug.cfs_rq:/.h_nr_running.stddev 17954 ± 13% -38.2% 11093 ± 20% sched_debug.cfs_rq:/.load.avg 59044 ± 2% +55.1% 91605 ± 3% sched_debug.cfs_rq:/.load.max 35.12 ± 14% -38.9% 21.45 ± 12% sched_debug.cfs_rq:/.load_avg.avg 425478 ± 9% -58.9% 175030 ± 14% sched_debug.cfs_rq:/.min_vruntime.avg 1451058 ± 13% -68.9% 451040 ± 3% sched_debug.cfs_rq:/.min_vruntime.max 511538 ± 10% -66.3% 172194 ± 7% sched_debug.cfs_rq:/.min_vruntime.stddev 0.30 ± 12% -57.3% 0.13 ± 16% sched_debug.cfs_rq:/.nr_running.avg 0.42 ± 3% -25.0% 0.32 ± 7% sched_debug.cfs_rq:/.nr_running.stddev 316.29 ± 10% -51.8% 152.57 ± 11% sched_debug.cfs_rq:/.runnable_avg.avg 1011 ± 2% -26.9% 739.47 ± 4% sched_debug.cfs_rq:/.runnable_avg.max 396.72 ± 2% -38.9% 242.26 ± 4% sched_debug.cfs_rq:/.runnable_avg.stddev -550745 -65.2% -191612 sched_debug.cfs_rq:/.spread0.avg 474857 ± 58% -82.2% 84412 ± 28% sched_debug.cfs_rq:/.spread0.max -956414 -63.9% -345608 sched_debug.cfs_rq:/.spread0.min 511547 ± 10% -66.3% 172197 ± 7% sched_debug.cfs_rq:/.spread0.stddev 316.22 ± 10% -51.8% 152.49 ± 11% sched_debug.cfs_rq:/.util_avg.avg 1010 ± 2% -26.9% 739.42 ± 4% sched_debug.cfs_rq:/.util_avg.max 396.65 ± 2% -38.9% 242.22 ± 4% sched_debug.cfs_rq:/.util_avg.stddev 237.99 ± 14% -75.7% 57.81 ± 16% sched_debug.cfs_rq:/.util_est_enqueued.avg 962.81 ± 2% -27.2% 701.03 ± 2% sched_debug.cfs_rq:/.util_est_enqueued.max 359.62 ± 5% -54.3% 164.36 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev 242264 ± 6% -51.7% 116978 ± 7% sched_debug.cpu.avg_idle.stddev 1801 ± 6% +58.5% 2855 ± 3% sched_debug.cpu.clock_task.stddev 711.81 ± 4% -35.4% 459.59 ± 7% sched_debug.cpu.curr->pid.avg 1909 -16.7% 1589 ± 4% sched_debug.cpu.curr->pid.stddev 0.13 ± 4% -35.4% 0.08 ± 6% sched_debug.cpu.nr_running.avg 0.33 ± 2% -20.1% 0.26 ± 3% sched_debug.cpu.nr_running.stddev 13910 ± 6% -36.7% 8800 sched_debug.cpu.nr_switches.avg 18507 ± 12% -40.5% 11004 ± 14% sched_debug.cpu.nr_switches.stddev 4.30 ± 13% +95.0% 8.38 ± 39% perf-stat.i.MPKI 3.88e+09 ± 5% -32.4% 2.621e+09 perf-stat.i.branch-instructions 0.07 ± 8% +0.3 0.38 ± 81% perf-stat.i.branch-miss-rate% 2747085 ± 10% +269.7% 10156358 ± 79% perf-stat.i.branch-misses 8.60 ± 12% -3.3 5.34 ± 20% perf-stat.i.cache-miss-rate% 6857924 ± 5% -23.2% 5265295 ± 16% perf-stat.i.cache-misses 10324 ± 8% -46.2% 5552 perf-stat.i.context-switches 5.216e+10 -32.0% 3.545e+10 ± 4% perf-stat.i.cpu-cycles 139.62 +48.4% 207.16 ± 4% perf-stat.i.cpu-migrations 5.128e+09 ± 5% -31.6% 3.508e+09 perf-stat.i.dTLB-loads 7.55 -1.3 6.25 perf-stat.i.dTLB-store-miss-rate% 2.287e+08 ± 5% -44.8% 1.262e+08 ± 2% perf-stat.i.dTLB-store-misses 2.798e+09 ± 5% -32.4% 1.893e+09 perf-stat.i.dTLB-stores 1.876e+10 ± 5% -32.4% 1.269e+10 perf-stat.i.instructions 0.41 -32.0% 0.28 ± 4% perf-stat.i.metric.GHz 94.02 ± 5% -32.5% 63.48 ± 2% perf-stat.i.metric.M/sec 6207930 ± 5% -44.7% 3430475 ± 2% perf-stat.i.minor-faults 55974 ± 8% +39.7% 78180 ± 8% perf-stat.i.node-load-misses 6339958 ± 5% -42.7% 3633731 ± 3% perf-stat.i.node-stores 6207930 ± 5% -44.7% 3430475 ± 2% perf-stat.i.page-faults 4.30 ± 13% +94.4% 8.35 ± 38% perf-stat.overall.MPKI 0.07 ± 7% +0.3 0.39 ± 79% perf-stat.overall.branch-miss-rate% 8.65 ± 12% -3.3 5.39 ± 20% perf-stat.overall.cache-miss-rate% 7.55 -1.3 6.25 perf-stat.overall.dTLB-store-miss-rate% 0.27 ± 37% +0.4 0.66 ± 31% perf-stat.overall.node-store-miss-rate% 910167 +22.4% 1114007 perf-stat.overall.path-length 3.867e+09 ± 5% -32.5% 2.612e+09 perf-stat.ps.branch-instructions 2739799 ± 9% +269.3% 10116762 ± 80% perf-stat.ps.branch-misses 6834912 ± 5% -23.2% 5246515 ± 16% perf-stat.ps.cache-misses 10291 ± 8% -46.2% 5533 perf-stat.ps.context-switches 5.198e+10 -32.0% 3.534e+10 ± 4% perf-stat.ps.cpu-cycles 139.18 +48.4% 206.52 ± 4% perf-stat.ps.cpu-migrations 5.111e+09 ± 5% -31.6% 3.496e+09 perf-stat.ps.dTLB-loads 2.279e+08 ± 5% -44.8% 1.258e+08 ± 2% perf-stat.ps.dTLB-store-misses 2.789e+09 ± 5% -32.4% 1.887e+09 perf-stat.ps.dTLB-stores 1.87e+10 ± 5% -32.4% 1.264e+10 perf-stat.ps.instructions 6187409 ± 5% -44.7% 3418985 ± 2% perf-stat.ps.minor-faults 55825 ± 8% +39.6% 77936 ± 8% perf-stat.ps.node-load-misses 6318444 ± 5% -42.7% 3620465 ± 3% perf-stat.ps.node-stores 6187409 ± 5% -44.7% 3418985 ± 2% perf-stat.ps.page-faults 5.662e+12 ± 5% -32.4% 3.83e+12 perf-stat.total.instructions 92.72 -14.8 77.93 ± 3% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 69.44 ± 2% -11.5 57.91 ± 4% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 82.34 -11.4 70.95 ± 3% perf-profile.calltrace.cycles-pp.testcase 69.74 ± 2% -10.5 59.24 ± 4% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 26.90 -6.0 20.87 ± 5% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 20.44 ± 4% -5.8 14.62 ± 5% perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 18.29 ± 3% -5.0 13.33 ± 4% perf-profile.calltrace.cycles-pp.down_read_trylock.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 27.60 ± 2% -4.9 22.73 ± 3% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 4.19 ± 5% -0.7 3.50 ± 5% perf-profile.calltrace.cycles-pp.error_entry.testcase 3.93 ± 5% -0.6 3.28 ± 5% perf-profile.calltrace.cycles-pp.sync_regs.error_entry.testcase 1.15 ± 8% +0.3 1.42 ± 5% perf-profile.calltrace.cycles-pp.do_set_pte.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.68 ± 7% +0.3 1.96 ± 5% perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 4.58 ± 5% +0.5 5.05 ± 4% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.00 +0.5 0.54 ± 4% perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.09 ±223% +0.6 0.65 ± 4% perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.20 ±141% +0.7 0.88 ± 40% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 0.00 +0.7 0.72 ± 7% perf-profile.calltrace.cycles-pp.page_remove_rmap.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +0.8 0.79 ± 6% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.testcase 0.00 +0.8 0.79 ± 5% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.testcase 0.00 +0.8 0.80 ± 6% perf-profile.calltrace.cycles-pp.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 0.00 +0.8 0.82 ± 6% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.testcase 0.81 ± 20% +0.9 1.70 ± 43% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state 0.82 ± 20% +0.9 1.74 ± 44% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 0.10 ±223% +1.0 1.13 ± 60% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.00 +1.0 1.04 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase 1.28 ± 20% +1.4 2.68 ± 41% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 1.55 ± 19% +1.6 3.14 ± 37% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 0.00 +1.6 1.63 ± 7% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.down_read_trylock 0.00 +1.6 1.64 ± 7% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault 0.00 +1.7 1.72 ± 7% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault.exc_page_fault 0.00 +1.8 1.76 ± 44% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 0.00 +1.9 1.85 ± 43% perf-profile.calltrace.cycles-pp.flush_tlb_func.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 0.00 +2.1 2.08 ± 7% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +2.1 2.09 ± 8% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.up_read 0.00 +2.1 2.09 ± 8% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.up_read.do_user_addr_fault 0.00 +2.1 2.15 ± 6% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault 0.00 +2.2 2.16 ± 6% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault 0.00 +2.2 2.19 ± 8% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.up_read.do_user_addr_fault.exc_page_fault 0.00 +2.3 2.25 ± 6% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +2.5 2.53 ± 4% perf-profile.calltrace.cycles-pp.__default_send_IPI_dest_field.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 0.00 +2.6 2.59 ± 4% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 0.00 +2.6 2.63 ± 8% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +2.7 2.70 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault 0.00 +2.7 2.71 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault 0.00 +2.7 2.72 ± 6% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.00 +2.8 2.84 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 4.34 ± 11% +3.0 7.38 ± 16% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 4.36 ± 11% +3.1 7.42 ± 16% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 6.20 ± 12% +4.9 11.08 ± 22% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 6.43 ± 12% +5.0 11.48 ± 21% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 0.00 +5.2 5.18 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 6.94 ± 13% +5.6 12.54 ± 23% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 7.04 ± 13% +5.7 12.75 ± 23% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 7.05 ± 13% +5.7 12.77 ± 23% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 7.05 ± 13% +5.7 12.77 ± 23% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 0.00 +5.8 5.77 ± 17% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range 7.10 ± 13% +5.8 12.88 ± 23% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 0.00 +5.8 5.81 ± 17% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +6.8 6.82 ± 4% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 2.05 ± 5% +7.1 9.11 ± 3% perf-profile.calltrace.cycles-pp.__munmap 2.04 ± 5% +7.1 9.11 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 2.04 ± 5% +7.1 9.11 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 2.04 ± 5% +7.1 9.11 ± 3% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 2.04 ± 5% +7.1 9.11 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 2.01 ± 5% +7.1 9.08 ± 3% perf-profile.calltrace.cycles-pp.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.99 ± 5% +7.1 9.06 ± 3% perf-profile.calltrace.cycles-pp.unmap_region.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 1.96 ± 5% +7.1 9.04 ± 3% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_mas_align_munmap 1.96 ± 5% +7.1 9.04 ± 3% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap 1.96 ± 5% +7.1 9.04 ± 3% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_mas_align_munmap.__vm_munmap 1.89 ± 5% +7.1 8.99 ± 3% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 0.00 +7.3 7.30 ± 4% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function 0.00 +7.9 7.92 ± 4% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 90.30 -12.9 77.37 ± 3% perf-profile.children.cycles-pp.testcase 81.56 -12.7 68.87 ± 3% perf-profile.children.cycles-pp.asm_exc_page_fault 69.82 ± 2% -10.5 59.30 ± 4% perf-profile.children.cycles-pp.exc_page_fault 69.67 ± 2% -10.5 59.18 ± 4% perf-profile.children.cycles-pp.do_user_addr_fault 26.76 -5.6 21.20 ± 4% perf-profile.children.cycles-pp.__handle_mm_fault 20.49 ± 4% -5.5 15.03 ± 5% perf-profile.children.cycles-pp.up_read 28.06 -4.7 23.34 ± 4% perf-profile.children.cycles-pp.handle_mm_fault 18.33 ± 3% -4.7 13.66 ± 4% perf-profile.children.cycles-pp.down_read_trylock 3.94 ± 5% -0.6 3.30 ± 5% perf-profile.children.cycles-pp.sync_regs 4.36 ± 5% -0.6 3.77 ± 5% perf-profile.children.cycles-pp.error_entry 0.14 ± 12% -0.0 0.10 ± 16% perf-profile.children.cycles-pp.rwsem_down_read_slowpath 0.14 ± 7% -0.0 0.11 ± 9% perf-profile.children.cycles-pp.folio_memcg_lock 0.07 ± 8% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__tlb_remove_page_size 0.17 ± 6% -0.0 0.14 ± 9% perf-profile.children.cycles-pp.__irqentry_text_end 0.07 ± 6% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.noop_dirty_folio 0.08 ± 6% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.perf_callchain_user 0.05 ± 7% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.07 ± 20% +0.0 0.12 ± 22% perf-profile.children.cycles-pp.arch_scale_freq_tick 0.21 ± 5% +0.0 0.26 ± 7% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.10 ± 14% +0.1 0.16 ± 17% perf-profile.children.cycles-pp.read_tsc 0.04 ± 45% +0.1 0.10 ± 52% perf-profile.children.cycles-pp.update_rq_clock 0.05 ± 46% +0.1 0.11 ± 18% perf-profile.children.cycles-pp.start_kernel 0.05 ± 46% +0.1 0.11 ± 18% perf-profile.children.cycles-pp.arch_call_rest_init 0.05 ± 46% +0.1 0.11 ± 18% perf-profile.children.cycles-pp.rest_init 0.12 ± 11% +0.1 0.18 ± 11% perf-profile.children.cycles-pp.lapic_next_deadline 0.00 +0.1 0.06 ± 17% perf-profile.children.cycles-pp.restore_regs_and_return_to_kernel 0.06 ± 11% +0.1 0.12 ± 22% perf-profile.children.cycles-pp.find_busiest_group 0.04 ± 47% +0.1 0.11 ± 32% perf-profile.children.cycles-pp.get_next_timer_interrupt 0.05 ± 45% +0.1 0.11 ± 26% perf-profile.children.cycles-pp.update_sd_lb_stats 0.04 ± 44% +0.1 0.11 ± 29% perf-profile.children.cycles-pp.irqentry_enter 0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.hrtimer_next_event_without 0.46 ± 5% +0.1 0.52 ± 4% perf-profile.children.cycles-pp.__might_resched 0.01 ±223% +0.1 0.08 ± 23% perf-profile.children.cycles-pp.update_sg_lb_stats 0.00 +0.1 0.07 ± 12% perf-profile.children.cycles-pp.idle_cpu 0.20 ± 4% +0.1 0.27 ± 6% perf-profile.children.cycles-pp.__cond_resched 0.12 ± 9% +0.1 0.20 ± 6% perf-profile.children.cycles-pp.__mod_node_page_state 0.02 ± 99% +0.1 0.11 ± 19% perf-profile.children.cycles-pp.ret_from_fork 0.02 ± 99% +0.1 0.11 ± 19% perf-profile.children.cycles-pp.kthread 0.08 ± 11% +0.1 0.17 ± 28% perf-profile.children.cycles-pp.load_balance 0.35 ± 7% +0.1 0.44 ± 4% perf-profile.children.cycles-pp._raw_spin_lock 0.00 +0.1 0.10 ± 35% perf-profile.children.cycles-pp.update_blocked_averages 0.12 ± 15% +0.1 0.22 ± 25% perf-profile.children.cycles-pp.rebalance_domains 0.00 +0.1 0.10 ± 39% perf-profile.children.cycles-pp.run_rebalance_domains 0.21 ± 5% +0.1 0.31 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_state 0.12 ± 10% +0.1 0.22 ± 30% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler 0.69 ± 5% +0.1 0.79 ± 6% perf-profile.children.cycles-pp.page_remove_rmap 0.32 ± 6% +0.1 0.44 ± 3% perf-profile.children.cycles-pp.tlb_batch_pages_flush 0.12 ± 60% +0.1 0.25 ± 42% perf-profile.children.cycles-pp.irq_enter_rcu 0.26 ± 32% +0.1 0.40 ± 25% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length 0.65 ± 6% +0.2 0.82 ± 4% perf-profile.children.cycles-pp.___perf_sw_event 0.56 ± 4% +0.2 0.73 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_page_state 0.36 ± 5% +0.2 0.54 ± 30% perf-profile.children.cycles-pp.scheduler_tick 0.00 +0.2 0.19 ± 28% perf-profile.children.cycles-pp._find_next_bit 0.00 +0.2 0.19 ± 6% perf-profile.children.cycles-pp.irq_exit_rcu 0.20 ± 14% +0.2 0.40 ± 29% perf-profile.children.cycles-pp.__softirqentry_text_start 0.00 +0.2 0.22 ± 5% perf-profile.children.cycles-pp.error_return 0.00 +0.2 0.24 ± 9% perf-profile.children.cycles-pp.llist_add_batch 0.22 ± 30% +0.3 0.48 ± 9% perf-profile.children.cycles-pp.percpu_counter_add_batch 1.19 ± 8% +0.3 1.46 ± 5% perf-profile.children.cycles-pp.do_set_pte 0.12 ± 12% +0.3 0.40 ± 12% perf-profile.children.cycles-pp.native_sched_clock 1.72 ± 7% +0.3 2.01 ± 5% perf-profile.children.cycles-pp.finish_fault 0.14 ± 10% +0.3 0.49 ± 10% perf-profile.children.cycles-pp.sched_clock_cpu 0.44 ± 8% +0.4 0.81 ± 46% perf-profile.children.cycles-pp.update_process_times 0.09 ± 10% +0.4 0.47 ± 6% perf-profile.children.cycles-pp.irqtime_account_irq 0.86 ± 6% +0.4 1.26 ± 4% perf-profile.children.cycles-pp.__perf_sw_event 0.45 ± 8% +0.4 0.87 ± 51% perf-profile.children.cycles-pp.tick_sched_handle 0.52 ± 13% +0.5 0.97 ± 46% perf-profile.children.cycles-pp.tick_sched_timer 4.65 ± 5% +0.5 5.11 ± 4% perf-profile.children.cycles-pp.do_fault 0.43 ± 29% +0.5 0.90 ± 40% perf-profile.children.cycles-pp.menu_select 0.28 ± 11% +0.5 0.80 ± 18% perf-profile.children.cycles-pp.__irq_exit_rcu 3.58 ± 6% +0.6 4.22 ± 7% perf-profile.children.cycles-pp.native_irq_return_iret 0.02 ± 99% +0.7 0.72 ± 3% perf-profile.children.cycles-pp.llist_reverse_order 0.72 ± 11% +0.7 1.43 ± 48% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.00 +0.8 0.82 ± 6% perf-profile.children.cycles-pp.tlb_flush_rmaps 1.18 ± 18% +0.9 2.06 ± 35% perf-profile.children.cycles-pp.hrtimer_interrupt 1.18 ± 18% +0.9 2.10 ± 36% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 1.70 ± 18% +1.4 3.09 ± 36% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 2.05 ± 17% +1.6 3.62 ± 31% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.16 ± 14% +2.4 2.54 ± 4% perf-profile.children.cycles-pp.__default_send_IPI_dest_field 0.16 ± 14% +2.4 2.60 ± 4% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys 4.42 ± 10% +3.1 7.50 ± 15% perf-profile.children.cycles-pp.mwait_idle_with_hints 4.40 ± 10% +3.1 7.50 ± 16% perf-profile.children.cycles-pp.intel_idle 6.47 ± 12% +5.1 11.57 ± 21% perf-profile.children.cycles-pp.cpuidle_enter_state 6.48 ± 12% +5.1 11.58 ± 21% perf-profile.children.cycles-pp.cpuidle_enter 0.26 ± 11% +5.5 5.81 ± 17% perf-profile.children.cycles-pp.smp_call_function_many_cond 0.26 ± 11% +5.6 5.82 ± 17% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 6.99 ± 13% +5.7 12.65 ± 23% perf-profile.children.cycles-pp.cpuidle_idle_call 7.05 ± 13% +5.7 12.77 ± 23% perf-profile.children.cycles-pp.start_secondary 7.10 ± 13% +5.8 12.88 ± 23% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 7.10 ± 13% +5.8 12.88 ± 23% perf-profile.children.cycles-pp.cpu_startup_entry 7.10 ± 13% +5.8 12.88 ± 23% perf-profile.children.cycles-pp.do_idle 0.27 ± 11% +6.6 6.83 ± 4% perf-profile.children.cycles-pp.flush_tlb_mm_range 2.05 ± 5% +7.1 9.11 ± 3% perf-profile.children.cycles-pp.__munmap 2.05 ± 5% +7.1 9.11 ± 3% perf-profile.children.cycles-pp.__vm_munmap 2.05 ± 5% +7.1 9.11 ± 3% perf-profile.children.cycles-pp.__x64_sys_munmap 2.02 ± 5% +7.1 9.08 ± 3% perf-profile.children.cycles-pp.do_mas_align_munmap 1.99 ± 5% +7.1 9.06 ± 3% perf-profile.children.cycles-pp.unmap_region 1.97 ± 5% +7.1 9.05 ± 3% perf-profile.children.cycles-pp.unmap_vmas 1.96 ± 5% +7.1 9.05 ± 3% perf-profile.children.cycles-pp.unmap_page_range 1.96 ± 5% +7.1 9.05 ± 3% perf-profile.children.cycles-pp.zap_pmd_range 1.96 ± 5% +7.1 9.05 ± 3% perf-profile.children.cycles-pp.zap_pte_range 2.26 ± 5% +7.1 9.37 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 2.26 ± 5% +7.1 9.37 ± 3% perf-profile.children.cycles-pp.do_syscall_64 0.20 ± 11% +10.6 10.77 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.20 ± 11% +10.6 10.79 ± 4% perf-profile.children.cycles-pp.__sysvec_call_function 0.00 +11.1 11.09 ± 3% perf-profile.children.cycles-pp.native_flush_tlb_one_user 0.23 ± 11% +11.3 11.49 ± 4% perf-profile.children.cycles-pp.sysvec_call_function 0.10 ± 13% +11.8 11.89 ± 3% perf-profile.children.cycles-pp.flush_tlb_func 0.43 ± 12% +14.0 14.47 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function 21.36 ± 4% -8.6 12.74 ± 5% perf-profile.self.cycles-pp.__handle_mm_fault 20.32 ± 4% -7.9 12.42 ± 5% perf-profile.self.cycles-pp.up_read 18.17 ± 3% -6.6 11.61 ± 4% perf-profile.self.cycles-pp.down_read_trylock 12.27 ± 4% -2.1 10.17 ± 5% perf-profile.self.cycles-pp.testcase 3.88 ± 5% -0.6 3.24 ± 5% perf-profile.self.cycles-pp.sync_regs 1.01 ± 11% -0.2 0.80 ± 7% perf-profile.self.cycles-pp.mt_find 0.65 ± 4% -0.1 0.56 ± 4% perf-profile.self.cycles-pp.__filemap_get_folio 0.30 ± 6% -0.1 0.24 ± 4% perf-profile.self.cycles-pp.page_add_file_rmap 0.29 ± 3% -0.0 0.24 ± 3% perf-profile.self.cycles-pp.asm_exc_page_fault 0.24 ± 5% -0.0 0.20 ± 6% perf-profile.self.cycles-pp.do_set_pte 0.07 ± 5% -0.0 0.04 ± 71% perf-profile.self.cycles-pp.lock_page_memcg 0.27 ± 2% -0.0 0.24 ± 5% perf-profile.self.cycles-pp.xas_load 0.15 ± 4% -0.0 0.14 ± 6% perf-profile.self.cycles-pp.handle_pte_fault 0.04 ± 45% +0.0 0.07 ± 11% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt 0.06 ± 8% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.find_vma 0.13 ± 9% +0.0 0.16 ± 6% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.13 ± 5% +0.0 0.16 ± 9% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.09 ± 7% +0.0 0.12 ± 11% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.12 ± 4% +0.0 0.16 ± 5% perf-profile.self.cycles-pp.__cond_resched 0.07 ± 20% +0.0 0.12 ± 22% perf-profile.self.cycles-pp.arch_scale_freq_tick 0.06 ± 13% +0.0 0.10 ± 28% perf-profile.self.cycles-pp.cpuidle_idle_call 0.10 ± 7% +0.1 0.16 ± 6% perf-profile.self.cycles-pp.__mod_node_page_state 0.10 ± 14% +0.1 0.16 ± 17% perf-profile.self.cycles-pp.read_tsc 0.12 ± 11% +0.1 0.18 ± 11% perf-profile.self.cycles-pp.lapic_next_deadline 0.00 +0.1 0.06 ± 21% perf-profile.self.cycles-pp.irqentry_enter 0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.idle_cpu 0.00 +0.1 0.08 perf-profile.self.cycles-pp.error_return 0.00 +0.1 0.09 ± 7% perf-profile.self.cycles-pp.flush_tlb_mm_range 0.00 +0.1 0.13 ± 18% perf-profile.self.cycles-pp.irqtime_account_irq 0.23 ± 6% +0.1 0.38 ± 3% perf-profile.self.cycles-pp.__perf_sw_event 0.00 +0.2 0.16 ± 31% perf-profile.self.cycles-pp._find_next_bit 0.44 ± 4% +0.2 0.61 ± 5% perf-profile.self.cycles-pp.zap_pte_range 0.19 ± 33% +0.2 0.39 ± 8% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.00 +0.2 0.21 ± 6% perf-profile.self.cycles-pp.asm_sysvec_call_function 0.00 +0.2 0.24 ± 8% perf-profile.self.cycles-pp.llist_add_batch 0.00 +0.2 0.24 ± 7% perf-profile.self.cycles-pp.sysvec_call_function 0.15 ± 35% +0.3 0.41 ± 48% perf-profile.self.cycles-pp.menu_select 0.37 ± 14% +0.3 0.64 ± 20% perf-profile.self.cycles-pp.cpuidle_enter_state 0.11 ± 12% +0.3 0.38 ± 10% perf-profile.self.cycles-pp.native_sched_clock 3.58 ± 6% +0.6 4.21 ± 7% perf-profile.self.cycles-pp.native_irq_return_iret 0.02 ± 99% +0.7 0.72 ± 3% perf-profile.self.cycles-pp.llist_reverse_order 0.00 +0.8 0.84 ± 3% perf-profile.self.cycles-pp.flush_tlb_func 0.07 ± 12% +0.8 0.92 ± 6% perf-profile.self.cycles-pp.smp_call_function_many_cond 0.06 ± 13% +0.9 0.93 ± 4% perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.61 ± 6% +1.0 1.60 ± 3% perf-profile.self.cycles-pp.do_user_addr_fault 0.16 ± 14% +2.4 2.54 ± 4% perf-profile.self.cycles-pp.__default_send_IPI_dest_field 4.40 ± 10% +3.1 7.48 ± 15% perf-profile.self.cycles-pp.mwait_idle_with_hints 0.00 +11.0 11.04 ± 3% perf-profile.self.cycles-pp.native_flush_tlb_one_user The fix patch is under testing. We will send the result once the test is done. Best Regards, Yujie > Because if it's a one-off, it's probably best ignored. If it shows up > elsewhere, I think that batching logic might need looking at. > > Linus