Hello, kernel test robot noticed a 46.0% improvement of vm-scalability.throughput on: commit: 39fbbca087dd149cdb82f08e7b92d62395c21ecf ("[PATCH v2 5/6] mm: Handle read faults under the VMA lock") url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513 base: v6.6-rc4 patch link: https://lore.kernel.org/all/20231006195318.4087158-6-willy@xxxxxxxxxxxxx/ patch subject: [PATCH v2 5/6] mm: Handle read faults under the VMA lock testcase: vm-scalability test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory parameters: runtime: 300s size: 2T test: shm-pread-seq-mt cpufreq_governor: performance test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231020/202310201715.3f52109d-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/2T/lkp-csl-2sp3/shm-pread-seq-mt/vm-scalability commit: 90e99527c7 ("mm: Handle COW faults under the VMA lock") 39fbbca087 ("mm: Handle read faults under the VMA lock") 90e99527c746cd9e 39fbbca087dd149cdb82f08e7b9 ---------------- --------------------------- %stddev %change %stddev \ | \ 34.69 ± 23% +72.5% 59.82 ± 2% vm-scalability.free_time 173385 +45.6% 252524 vm-scalability.median 16599151 +46.0% 24242352 vm-scalability.throughput 390.45 +6.9% 417.32 vm-scalability.time.elapsed_time 390.45 +6.9% 417.32 vm-scalability.time.elapsed_time.max 45781 ± 2% +16.3% 53251 ± 2% vm-scalability.time.involuntary_context_switches 4.213e+09 +50.1% 6.325e+09 vm-scalability.time.maximum_resident_set_size 5.316e+08 +47.3% 7.83e+08 vm-scalability.time.minor_page_faults 6400 -8.0% 5890 vm-scalability.time.percent_of_cpu_this_job_got 21673 -10.2% 19455 vm-scalability.time.system_time 3319 +54.4% 5126 vm-scalability.time.user_time 2.321e+08 ± 2% +27.2% 2.953e+08 ± 5% vm-scalability.time.voluntary_context_switches 5.004e+09 +42.2% 7.116e+09 vm-scalability.workload 13110 +24.0% 16254 uptime.idle 1.16e+10 +24.5% 1.444e+10 cpuidle..time 2.648e+08 ± 3% +16.3% 3.079e+08 ± 5% cpuidle..usage 22.86 +6.3 29.17 mpstat.cpu.all.idle% 8.29 ± 5% -1.2 7.13 ± 7% mpstat.cpu.all.iowait% 58.63 -9.2 49.38 mpstat.cpu.all.sys% 9.05 +4.0 13.09 mpstat.cpu.all.usr% 8721571 ± 5% +44.8% 12630342 ± 2% numa-numastat.node0.local_node 8773210 ± 5% +44.8% 12706884 ± 2% numa-numastat.node0.numa_hit 7793725 ± 5% +51.3% 11793573 numa-numastat.node1.local_node 7842342 ± 5% +50.7% 11816543 numa-numastat.node1.numa_hit 23.17 +26.8% 29.37 vmstat.cpu.id 31295414 +50.9% 47211341 vmstat.memory.cache 95303378 -18.8% 77355720 vmstat.memory.free 1176885 ± 2% +19.2% 1402891 ± 3% vmstat.system.cs 194658 +5.4% 205149 ± 2% vmstat.system.in 9920198 ± 10% -48.9% 5071533 ± 15% turbostat.C1 0.51 ± 12% -0.3 0.21 ± 12% turbostat.C1% 1831098 ± 15% -72.0% 512888 ± 19% turbostat.C1E 0.14 ± 13% -0.1 0.06 ± 11% turbostat.C1E% 8736699 +36.3% 11905646 turbostat.C6 22.74 +6.3 29.02 turbostat.C6% 17.82 +25.5% 22.37 turbostat.CPU%c1 5.36 +28.2% 6.87 turbostat.CPU%c6 0.07 +42.9% 0.10 turbostat.IPC 77317703 +12.3% 86804635 ± 3% turbostat.IRQ 2.443e+08 ± 3% +18.9% 2.904e+08 ± 6% turbostat.POLL 4.80 +30.2% 6.24 turbostat.Pkg%pc2 266.73 -1.3% 263.33 turbostat.PkgWatt 0.00 -25.0% 0.00 perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 0.06 ± 11% -21.8% 0.04 ± 9% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 26.45 ± 9% -16.0% 22.21 ± 6% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 0.00 -25.0% 0.00 perf-sched.total_sch_delay.average.ms 106.37 ±167% -79.1% 22.21 ± 6% perf-sched.total_sch_delay.max.ms 0.46 ± 2% -16.0% 0.39 ± 5% perf-sched.total_wait_and_delay.average.ms 2202457 ± 2% +26.1% 2776824 ± 3% perf-sched.total_wait_and_delay.count.ms 0.45 ± 2% -15.9% 0.38 ± 5% perf-sched.total_wait_time.average.ms 0.02 ± 2% -19.8% 0.01 ± 2% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 494.65 ± 4% +10.6% 546.88 ± 3% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 2196122 ± 2% +26.1% 2770017 ± 3% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 0.01 ± 3% -19.5% 0.01 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 494.63 ± 4% +10.6% 546.87 ± 3% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.22 ± 42% -68.8% 0.07 ±125% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra 11445425 +82.1% 20837223 meminfo.Active 11444642 +82.1% 20836443 meminfo.Active(anon) 31218122 +51.0% 47138293 meminfo.Cached 30006048 +53.7% 46116816 meminfo.Committed_AS 17425032 +37.4% 23950392 meminfo.Inactive 17423257 +37.5% 23948613 meminfo.Inactive(anon) 164910 +21.8% 200913 meminfo.KReclaimable 26336530 +57.6% 41514589 meminfo.Mapped 94668993 -19.0% 76693589 meminfo.MemAvailable 95202238 -18.9% 77208832 meminfo.MemFree 36610737 +49.1% 54604143 meminfo.Memused 4072810 +50.1% 6114589 meminfo.PageTables 164910 +21.8% 200913 meminfo.SReclaimable 28535318 +55.8% 44455489 meminfo.Shmem 367289 +10.1% 404373 meminfo.Slab 37978157 +50.2% 57055526 meminfo.max_used_kB 2860756 +82.1% 5208445 proc-vmstat.nr_active_anon 2361286 -19.0% 1912151 proc-vmstat.nr_dirty_background_threshold 4728345 -19.0% 3828978 proc-vmstat.nr_dirty_threshold 7804148 +51.0% 11783823 proc-vmstat.nr_file_pages 23801109 -18.9% 19303173 proc-vmstat.nr_free_pages 4355690 +37.5% 5986921 proc-vmstat.nr_inactive_anon 6583645 +57.6% 10377790 proc-vmstat.nr_mapped 1018109 +50.1% 1528565 proc-vmstat.nr_page_table_pages 7133183 +55.8% 11112858 proc-vmstat.nr_shmem 41226 +21.8% 50226 proc-vmstat.nr_slab_reclaimable 2860756 +82.1% 5208445 proc-vmstat.nr_zone_active_anon 4355690 +37.5% 5986921 proc-vmstat.nr_zone_inactive_anon 112051 +3.8% 116273 proc-vmstat.numa_hint_faults 16618553 +47.6% 24525492 proc-vmstat.numa_hit 16518296 +47.9% 24425975 proc-vmstat.numa_local 11052273 +49.9% 16566743 proc-vmstat.pgactivate 16757533 +47.2% 24672644 proc-vmstat.pgalloc_normal 5.329e+08 +47.2% 7.844e+08 proc-vmstat.pgfault 16101786 +48.3% 23877738 proc-vmstat.pgfree 3302784 +6.0% 3500288 proc-vmstat.unevictable_pgs_scanned 6101287 ± 7% +81.3% 11062634 ± 3% numa-meminfo.node0.Active 6101026 ± 7% +81.3% 11062389 ± 3% numa-meminfo.node0.Active(anon) 17217355 ± 5% +46.3% 25196100 ± 3% numa-meminfo.node0.FilePages 9363213 ± 7% +31.9% 12347562 ± 2% numa-meminfo.node0.Inactive 9362621 ± 7% +31.9% 12347130 ± 2% numa-meminfo.node0.Inactive(anon) 14211196 ± 7% +51.2% 21487599 numa-meminfo.node0.Mapped 45879058 ± 2% -19.6% 36888633 ± 2% numa-meminfo.node0.MemFree 19925073 ± 5% +45.1% 28915498 ± 3% numa-meminfo.node0.MemUsed 2032891 +50.5% 3060344 numa-meminfo.node0.PageTables 15318197 ± 6% +52.0% 23276446 ± 2% numa-meminfo.node0.Shmem 5342463 ± 7% +82.9% 9769639 ± 4% numa-meminfo.node1.Active 5341941 ± 7% +82.9% 9769104 ± 4% numa-meminfo.node1.Active(anon) 13998966 ± 8% +56.6% 21919509 ± 3% numa-meminfo.node1.FilePages 8060699 ± 7% +43.7% 11584190 ± 2% numa-meminfo.node1.Inactive 8059515 ± 7% +43.7% 11582844 ± 2% numa-meminfo.node1.Inactive(anon) 12125745 ± 7% +65.0% 20005342 numa-meminfo.node1.Mapped 49326340 ± 2% -18.2% 40347902 ± 2% numa-meminfo.node1.MemFree 16682503 ± 7% +53.8% 25660941 ± 3% numa-meminfo.node1.MemUsed 2039529 +49.6% 3051247 numa-meminfo.node1.PageTables 13214266 ± 7% +60.1% 21155303 ± 2% numa-meminfo.node1.Shmem 156378 ± 13% +21.1% 189316 ± 9% numa-meminfo.node1.Slab 1525784 ± 7% +81.4% 2767183 ± 3% numa-vmstat.node0.nr_active_anon 4304756 ± 5% +46.4% 6302189 ± 3% numa-vmstat.node0.nr_file_pages 11469263 ± 2% -19.6% 9218468 ± 2% numa-vmstat.node0.nr_free_pages 2340569 ± 7% +32.0% 3088383 ± 2% numa-vmstat.node0.nr_inactive_anon 3553304 ± 7% +51.3% 5375214 numa-vmstat.node0.nr_mapped 508315 +50.6% 765564 numa-vmstat.node0.nr_page_table_pages 3829966 ± 6% +52.0% 5822276 ± 2% numa-vmstat.node0.nr_shmem 1525783 ± 7% +81.4% 2767184 ± 3% numa-vmstat.node0.nr_zone_active_anon 2340569 ± 7% +32.0% 3088382 ± 2% numa-vmstat.node0.nr_zone_inactive_anon 8773341 ± 5% +44.8% 12707017 ± 2% numa-vmstat.node0.numa_hit 8721702 ± 5% +44.8% 12630474 ± 2% numa-vmstat.node0.numa_local 1335910 ± 7% +82.9% 2443778 ± 4% numa-vmstat.node1.nr_active_anon 3500040 ± 8% +56.7% 5482887 ± 3% numa-vmstat.node1.nr_file_pages 12331163 ± 2% -18.2% 10083422 ± 2% numa-vmstat.node1.nr_free_pages 2014795 ± 7% +43.8% 2897243 ± 2% numa-vmstat.node1.nr_inactive_anon 3031806 ± 7% +65.1% 5004449 numa-vmstat.node1.nr_mapped 510000 +49.7% 763297 numa-vmstat.node1.nr_page_table_pages 3303865 ± 7% +60.2% 5291835 ± 2% numa-vmstat.node1.nr_shmem 1335910 ± 7% +82.9% 2443778 ± 4% numa-vmstat.node1.nr_zone_active_anon 2014795 ± 7% +43.8% 2897242 ± 2% numa-vmstat.node1.nr_zone_inactive_anon 7842425 ± 5% +50.7% 11816530 numa-vmstat.node1.numa_hit 7793808 ± 5% +51.3% 11793555 numa-vmstat.node1.numa_local 9505083 +21.3% 11532590 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg 9551715 +21.4% 11595502 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max 9426050 +21.4% 11443528 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min 19249 ± 4% +28.3% 24698 ± 10% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.79 -30.7% 0.55 ± 8% sched_debug.cfs_rq:/.h_nr_running.avg 12458 ± 12% +70.8% 21277 ± 22% sched_debug.cfs_rq:/.load.avg 13767 ± 95% +311.7% 56677 ± 29% sched_debug.cfs_rq:/.load.stddev 9505083 +21.3% 11532590 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg 9551715 +21.4% 11595502 ± 3% sched_debug.cfs_rq:/.min_vruntime.max 9426050 +21.4% 11443528 ± 3% sched_debug.cfs_rq:/.min_vruntime.min 19249 ± 4% +28.3% 24698 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev 0.78 -30.7% 0.54 ± 8% sched_debug.cfs_rq:/.nr_running.avg 170.67 -21.4% 134.10 ± 6% sched_debug.cfs_rq:/.removed.load_avg.max 708.55 -32.2% 480.43 ± 7% sched_debug.cfs_rq:/.runnable_avg.avg 1510 ± 3% -12.5% 1320 ± 4% sched_debug.cfs_rq:/.runnable_avg.max 219.68 ± 7% -12.7% 191.74 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev 707.51 -32.3% 479.05 ± 7% sched_debug.cfs_rq:/.util_avg.avg 1506 ± 3% -12.6% 1317 ± 4% sched_debug.cfs_rq:/.util_avg.max 219.64 ± 7% -13.0% 191.15 ± 5% sched_debug.cfs_rq:/.util_avg.stddev 564.18 ± 2% -32.4% 381.24 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.avg 1168 ± 7% -14.8% 995.94 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.max 235.45 ± 5% -21.4% 185.13 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev 149234 ± 5% +192.0% 435707 ± 10% sched_debug.cpu.avg_idle.avg 404765 ± 17% +47.3% 596259 ± 15% sched_debug.cpu.avg_idle.max 5455 ± 4% +3302.8% 185624 ± 34% sched_debug.cpu.avg_idle.min 201990 +24.9% 252309 ± 5% sched_debug.cpu.clock.avg 201997 +24.9% 252315 ± 5% sched_debug.cpu.clock.max 201983 +24.9% 252303 ± 5% sched_debug.cpu.clock.min 3.80 ± 2% -10.1% 3.42 ± 3% sched_debug.cpu.clock.stddev 200296 +24.8% 249952 ± 5% sched_debug.cpu.clock_task.avg 200541 +24.8% 250280 ± 5% sched_debug.cpu.clock_task.max 194086 +25.5% 243582 ± 5% sched_debug.cpu.clock_task.min 4069 -32.7% 2739 ± 8% sched_debug.cpu.curr->pid.avg 8703 +15.2% 10027 ± 3% sched_debug.cpu.curr->pid.max 0.00 ± 6% -27.2% 0.00 ± 5% sched_debug.cpu.next_balance.stddev 0.78 -32.7% 0.52 ± 8% sched_debug.cpu.nr_running.avg 0.33 ± 6% -13.9% 0.29 ± 5% sched_debug.cpu.nr_running.stddev 2372181 ± 2% +57.6% 3737590 ± 8% sched_debug.cpu.nr_switches.avg 2448893 ± 2% +58.5% 3880813 ± 8% sched_debug.cpu.nr_switches.max 2290032 ± 2% +55.9% 3570559 ± 8% sched_debug.cpu.nr_switches.min 36185 ± 10% +74.8% 63244 ± 8% sched_debug.cpu.nr_switches.stddev 0.10 ± 19% +138.0% 0.23 ± 19% sched_debug.cpu.nr_uninterruptible.avg 201984 +24.9% 252304 ± 5% sched_debug.cpu_clk 201415 +25.0% 251735 ± 5% sched_debug.ktime 202543 +24.8% 252867 ± 5% sched_debug.sched_clk 3.84 ± 2% -14.1% 3.30 ± 2% perf-stat.i.MPKI 1.679e+10 +30.1% 2.186e+10 perf-stat.i.branch-instructions 0.54 ± 2% -0.1 0.45 perf-stat.i.branch-miss-rate% 75872684 -2.6% 73927540 perf-stat.i.branch-misses 31.85 -1.1 30.75 perf-stat.i.cache-miss-rate% 1184992 ± 2% +19.1% 1411069 ± 3% perf-stat.i.context-switches 3.49 -29.3% 2.47 perf-stat.i.cpi 2.265e+11 -8.1% 2.081e+11 perf-stat.i.cpu-cycles 950.46 ± 3% -11.6% 840.03 ± 2% perf-stat.i.cycles-between-cache-misses 9514714 ± 12% +27.3% 12109471 ± 10% perf-stat.i.dTLB-load-misses 1.556e+10 +29.9% 2.022e+10 perf-stat.i.dTLB-loads 1575276 ± 5% +35.8% 2138868 ± 5% perf-stat.i.dTLB-store-misses 3.396e+09 +21.6% 4.129e+09 perf-stat.i.dTLB-stores 79.97 +2.8 82.74 perf-stat.i.iTLB-load-miss-rate% 4265612 +8.4% 4624960 ± 2% perf-stat.i.iTLB-load-misses 712599 ± 8% -38.4% 438645 ± 7% perf-stat.i.iTLB-loads 5.59e+10 +27.7% 7.137e+10 perf-stat.i.instructions 12120 +11.6% 13525 ± 2% perf-stat.i.instructions-per-iTLB-miss 0.35 +32.7% 0.46 perf-stat.i.ipc 0.04 ± 38% +119.0% 0.08 ± 33% perf-stat.i.major-faults 2.36 -8.1% 2.17 perf-stat.i.metric.GHz 863.69 +7.5% 928.37 perf-stat.i.metric.K/sec 378.76 +28.8% 487.87 perf-stat.i.metric.M/sec 1359089 +37.9% 1874285 perf-stat.i.minor-faults 84.30 -2.8 81.50 perf-stat.i.node-load-miss-rate% 89.54 -2.5 87.09 perf-stat.i.node-store-miss-rate% 1359089 +37.9% 1874285 perf-stat.i.page-faults 3.65 ± 3% -22.5% 2.82 ± 4% perf-stat.overall.MPKI 0.45 -0.1 0.34 perf-stat.overall.branch-miss-rate% 32.64 -1.7 30.98 perf-stat.overall.cache-miss-rate% 4.05 -28.0% 2.92 perf-stat.overall.cpi 1113 ± 3% -7.1% 1034 ± 3% perf-stat.overall.cycles-between-cache-misses 0.05 ± 5% +0.0 0.05 ± 5% perf-stat.overall.dTLB-store-miss-rate% 85.73 +5.6 91.37 perf-stat.overall.iTLB-load-miss-rate% 13110 ± 2% +17.8% 15440 ± 2% perf-stat.overall.instructions-per-iTLB-miss 0.25 +39.0% 0.34 perf-stat.overall.ipc 4378 -4.2% 4195 perf-stat.overall.path-length 1.679e+10 +30.2% 2.186e+10 perf-stat.ps.branch-instructions 75862675 -2.6% 73920168 perf-stat.ps.branch-misses 1184994 ± 2% +19.1% 1411192 ± 3% perf-stat.ps.context-switches 2.265e+11 -8.1% 2.082e+11 perf-stat.ps.cpu-cycles 9518014 ± 12% +27.3% 12118863 ± 10% perf-stat.ps.dTLB-load-misses 1.556e+10 +29.9% 2.022e+10 perf-stat.ps.dTLB-loads 1575414 ± 5% +35.8% 2139373 ± 5% perf-stat.ps.dTLB-store-misses 3.396e+09 +21.6% 4.129e+09 perf-stat.ps.dTLB-stores 4265139 +8.4% 4625090 ± 2% perf-stat.ps.iTLB-load-misses 711002 ± 8% -38.5% 437258 ± 7% perf-stat.ps.iTLB-loads 5.59e+10 +27.7% 7.137e+10 perf-stat.ps.instructions 0.04 ± 37% +118.9% 0.08 ± 33% perf-stat.ps.major-faults 1359186 +37.9% 1874615 perf-stat.ps.minor-faults 1359186 +37.9% 1874615 perf-stat.ps.page-faults 2.191e+13 +36.3% 2.986e+13 perf-stat.total.instructions 74.66 -6.7 67.93 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access 74.61 -6.7 67.89 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 53.18 -6.3 46.88 perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 35.54 -6.1 29.43 perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault 76.49 -5.4 71.07 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access 79.82 -3.9 75.89 perf-profile.calltrace.cycles-pp.do_access 70.02 -3.8 66.23 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 70.39 -3.7 66.70 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 68.31 -2.8 65.51 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 68.29 -2.8 65.50 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.65 ± 7% -0.3 0.37 ± 71% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.io_schedule.folio_wait_bit_common 1.94 ± 6% -0.2 1.71 ± 6% perf-profile.calltrace.cycles-pp.__schedule.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp 1.96 ± 6% -0.2 1.74 ± 6% perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault 1.95 ± 6% -0.2 1.74 ± 6% perf-profile.calltrace.cycles-pp.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 0.86 +0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.filemap_map_pages.do_read_fault.do_fault 0.56 +0.2 0.72 ± 4% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry 1.16 ± 3% +0.2 1.33 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault 0.71 ± 2% +0.2 0.92 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary 0.78 +0.2 1.02 ± 4% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 0.44 ± 44% +0.3 0.73 ± 3% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault 0.89 ± 9% +0.3 1.24 ± 8% perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.23 +0.4 1.59 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.do_access 0.18 ±141% +0.4 0.57 ± 5% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_page_function.__wake_up_common.folio_wake_bit.filemap_map_pages 1.50 +0.6 2.05 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault 0.00 +0.6 0.56 ± 4% perf-profile.calltrace.cycles-pp.wake_page_function.__wake_up_common.folio_wake_bit.do_read_fault.do_fault 0.09 ±223% +0.6 0.69 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault 0.00 +0.6 0.60 perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault 2.98 ± 3% +0.7 3.66 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault 3.39 ± 3% +0.8 4.21 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault 7.48 +0.9 8.41 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault 2.25 ± 6% +1.0 3.30 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault 2.44 ± 5% +1.1 3.56 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault 3.11 ± 4% +1.4 4.52 perf-profile.calltrace.cycles-pp.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 10.14 +1.9 12.06 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault 10.26 +2.0 12.25 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault 10.29 +2.0 12.29 perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 9.69 +5.5 15.21 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once 74.66 -6.7 67.94 perf-profile.children.cycles-pp.exc_page_fault 74.62 -6.7 67.90 perf-profile.children.cycles-pp.do_user_addr_fault 53.19 -6.3 46.89 perf-profile.children.cycles-pp.filemap_map_pages 35.56 -6.1 29.44 perf-profile.children.cycles-pp.next_uptodate_folio 76.51 -6.0 70.48 perf-profile.children.cycles-pp.asm_exc_page_fault 70.02 -3.8 66.24 perf-profile.children.cycles-pp.__handle_mm_fault 70.40 -3.7 66.71 perf-profile.children.cycles-pp.handle_mm_fault 81.33 -3.5 77.78 perf-profile.children.cycles-pp.do_access 68.32 -2.8 65.52 perf-profile.children.cycles-pp.do_fault 68.30 -2.8 65.50 perf-profile.children.cycles-pp.do_read_fault 2.07 ± 7% -2.0 0.12 ± 6% perf-profile.children.cycles-pp.down_read_trylock 1.28 ± 4% -1.1 0.16 ± 4% perf-profile.children.cycles-pp.up_read 0.65 ± 12% -0.4 0.28 ± 15% perf-profile.children.cycles-pp.intel_idle_irq 1.96 ± 6% -0.2 1.74 ± 6% perf-profile.children.cycles-pp.schedule 1.96 ± 6% -0.2 1.74 ± 6% perf-profile.children.cycles-pp.io_schedule 0.36 ± 7% -0.2 0.15 ± 3% perf-profile.children.cycles-pp.mtree_range_walk 0.30 ± 8% -0.2 0.13 ± 14% perf-profile.children.cycles-pp.mm_cid_get 0.12 ± 12% -0.1 0.03 ±100% perf-profile.children.cycles-pp.update_sg_lb_stats 0.16 ± 9% -0.1 0.07 ± 15% perf-profile.children.cycles-pp.load_balance 0.14 ± 10% -0.1 0.05 ± 46% perf-profile.children.cycles-pp.update_sd_lb_stats 0.20 ± 10% -0.1 0.11 ± 8% perf-profile.children.cycles-pp.newidle_balance 0.14 ± 10% -0.1 0.06 ± 17% perf-profile.children.cycles-pp.find_busiest_group 0.33 ± 6% -0.0 0.28 ± 5% perf-profile.children.cycles-pp.pick_next_task_fair 0.05 +0.0 0.06 perf-profile.children.cycles-pp.nohz_run_idle_balance 0.06 +0.0 0.08 ± 6% perf-profile.children.cycles-pp.__update_load_avg_se 0.04 ± 44% +0.0 0.06 perf-profile.children.cycles-pp.reweight_entity 0.09 ± 7% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.xas_descend 0.08 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.update_curr 0.09 ± 7% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.prepare_task_switch 0.10 ± 4% +0.0 0.12 ± 3% perf-profile.children.cycles-pp.call_function_single_prep_ipi 0.08 ± 4% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq 0.04 ± 44% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.sched_clock 0.13 ± 7% +0.0 0.16 ± 4% perf-profile.children.cycles-pp.__sysvec_call_function_single 0.08 ± 6% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.set_next_entity 0.16 ± 4% +0.0 0.19 ± 3% perf-profile.children.cycles-pp.__switch_to 0.09 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.llist_reverse_order 0.04 ± 44% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.place_entity 0.14 ± 3% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.llist_add_batch 0.09 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.available_idle_cpu 0.15 ± 4% +0.0 0.18 ± 4% perf-profile.children.cycles-pp.sysvec_call_function_single 0.08 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.wake_affine 0.08 +0.0 0.11 perf-profile.children.cycles-pp.__list_del_entry_valid_or_report 0.11 ± 4% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.update_rq_clock_task 0.11 ± 4% +0.0 0.14 ± 4% perf-profile.children.cycles-pp.__switch_to_asm 0.04 ± 44% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.folio_add_lru 0.06 ± 7% +0.0 0.10 ± 6% perf-profile.children.cycles-pp.shmem_add_to_page_cache 0.18 ± 5% +0.0 0.22 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function_single 0.02 ±141% +0.0 0.06 ± 6% perf-profile.children.cycles-pp.tick_nohz_idle_exit 0.12 ± 3% +0.0 0.17 ± 5% perf-profile.children.cycles-pp.select_task_rq_fair 0.13 ± 3% +0.0 0.18 ± 6% perf-profile.children.cycles-pp.select_task_rq 0.23 ± 3% +0.1 0.29 ± 3% perf-profile.children.cycles-pp.__smp_call_single_queue 0.20 ± 3% +0.1 0.26 ± 3% perf-profile.children.cycles-pp.update_load_avg 0.01 ±223% +0.1 0.07 ± 18% perf-profile.children.cycles-pp.shmem_alloc_and_acct_folio 0.26 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.dequeue_entity 0.29 ± 3% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair 0.17 ± 3% +0.1 0.26 ± 2% perf-profile.children.cycles-pp.sync_regs 0.34 ± 2% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist 0.28 ± 3% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.enqueue_entity 0.28 ± 3% +0.1 0.38 ± 6% perf-profile.children.cycles-pp.__perf_sw_event 0.32 ± 2% +0.1 0.42 ± 5% perf-profile.children.cycles-pp.___perf_sw_event 0.34 ± 3% +0.1 0.44 ± 4% perf-profile.children.cycles-pp.enqueue_task_fair 0.36 ± 2% +0.1 0.46 ± 3% perf-profile.children.cycles-pp.activate_task 0.24 ± 2% +0.1 0.35 perf-profile.children.cycles-pp.native_irq_return_iret 0.30 ± 6% +0.1 0.42 ± 10% perf-profile.children.cycles-pp.xas_load 0.31 +0.1 0.43 ± 3% perf-profile.children.cycles-pp.folio_unlock 0.44 ± 2% +0.1 0.56 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate 0.40 ± 6% +0.2 0.56 ± 5% perf-profile.children.cycles-pp._compound_head 1.52 +0.2 1.68 ± 4% perf-profile.children.cycles-pp.wake_page_function 0.68 ± 3% +0.2 0.86 ± 4% perf-profile.children.cycles-pp.try_to_wake_up 0.66 ± 2% +0.2 0.84 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending 0.85 ± 2% +0.2 1.09 ± 3% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.79 ± 2% +0.2 1.03 ± 4% perf-profile.children.cycles-pp.flush_smp_call_function_queue 1.83 +0.3 2.08 ± 4% perf-profile.children.cycles-pp.__wake_up_common 1.29 +0.3 1.60 perf-profile.children.cycles-pp.folio_add_file_rmap_range 0.89 ± 9% +0.4 1.24 ± 8% perf-profile.children.cycles-pp.finish_fault 1.24 +0.4 1.60 perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 1.68 ± 3% +0.4 2.06 ± 2% perf-profile.children.cycles-pp.set_pte_range 1.50 +0.6 2.06 perf-profile.children.cycles-pp.filemap_get_entry 3.42 ± 3% +0.8 4.24 perf-profile.children.cycles-pp._raw_spin_lock_irq 7.48 +0.9 8.41 perf-profile.children.cycles-pp.folio_wait_bit_common 9.67 ± 4% +1.4 11.07 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 12.08 ± 3% +1.8 13.84 perf-profile.children.cycles-pp.folio_wake_bit 10.15 +1.9 12.07 perf-profile.children.cycles-pp.shmem_get_folio_gfp 11.80 ± 4% +1.9 13.74 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 10.26 +2.0 12.25 perf-profile.children.cycles-pp.shmem_fault 10.29 +2.0 12.29 perf-profile.children.cycles-pp.__do_fault 8.59 +5.3 13.94 ± 2% perf-profile.children.cycles-pp.do_rw_once 35.10 -6.1 28.98 ± 2% perf-profile.self.cycles-pp.next_uptodate_folio 2.06 ± 7% -1.9 0.11 ± 4% perf-profile.self.cycles-pp.down_read_trylock 1.28 ± 4% -1.1 0.16 ± 3% perf-profile.self.cycles-pp.up_read 1.66 ± 6% -1.0 0.68 ± 3% perf-profile.self.cycles-pp.__handle_mm_fault 7.20 -0.7 6.55 perf-profile.self.cycles-pp.filemap_map_pages 0.64 ± 12% -0.4 0.28 ± 15% perf-profile.self.cycles-pp.intel_idle_irq 0.36 ± 7% -0.2 0.15 perf-profile.self.cycles-pp.mtree_range_walk 0.30 ± 8% -0.2 0.13 ± 14% perf-profile.self.cycles-pp.mm_cid_get 0.71 ± 8% -0.1 0.59 ± 7% perf-profile.self.cycles-pp.__schedule 0.05 ± 8% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.ttwu_do_activate 0.08 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.do_idle 0.06 ± 6% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.enqueue_task_fair 0.05 ± 8% +0.0 0.07 ± 8% perf-profile.self.cycles-pp.__update_load_avg_se 0.09 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.xas_descend 0.04 ± 44% +0.0 0.06 perf-profile.self.cycles-pp.reweight_entity 0.05 ± 7% +0.0 0.07 ± 9% perf-profile.self.cycles-pp.set_pte_range 0.08 ± 6% +0.0 0.10 ± 5% perf-profile.self.cycles-pp.update_load_avg 0.10 ± 4% +0.0 0.12 ± 3% perf-profile.self.cycles-pp.call_function_single_prep_ipi 0.07 ± 5% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq 0.08 ± 6% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.flush_smp_call_function_queue 0.10 ± 4% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.16 ± 4% +0.0 0.19 ± 3% perf-profile.self.cycles-pp.__switch_to 0.14 ± 3% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.llist_add_batch 0.09 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.available_idle_cpu 0.08 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.enqueue_entity 0.08 ± 5% +0.0 0.12 ± 4% perf-profile.self.cycles-pp.llist_reverse_order 0.10 ± 4% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.update_rq_clock_task 0.08 +0.0 0.11 perf-profile.self.cycles-pp.__list_del_entry_valid_or_report 0.11 ± 4% +0.0 0.14 ± 4% perf-profile.self.cycles-pp.__switch_to_asm 0.09 ± 5% +0.0 0.12 ± 8% perf-profile.self.cycles-pp.ttwu_queue_wakelist 0.12 ± 4% +0.0 0.16 ± 6% perf-profile.self.cycles-pp.xas_load 0.00 +0.1 0.05 perf-profile.self.cycles-pp.sched_ttwu_pending 0.00 +0.1 0.06 perf-profile.self.cycles-pp.asm_exc_page_fault 0.11 ± 4% +0.1 0.18 ± 4% perf-profile.self.cycles-pp.shmem_fault 0.17 ± 3% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.sync_regs 0.31 ± 2% +0.1 0.40 ± 5% perf-profile.self.cycles-pp.___perf_sw_event 0.31 ± 2% +0.1 0.40 ± 3% perf-profile.self.cycles-pp.__wake_up_common 0.24 ± 2% +0.1 0.35 perf-profile.self.cycles-pp.native_irq_return_iret 0.31 +0.1 0.43 ± 3% perf-profile.self.cycles-pp.folio_unlock 0.44 ± 3% +0.1 0.57 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.68 ± 3% +0.1 0.83 ± 2% perf-profile.self.cycles-pp.folio_wake_bit 0.85 +0.2 1.00 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.40 ± 5% +0.2 0.56 ± 5% perf-profile.self.cycles-pp._compound_head 1.29 +0.3 1.59 perf-profile.self.cycles-pp.folio_add_file_rmap_range 0.99 +0.3 1.30 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp 2.08 +0.3 2.39 ± 2% perf-profile.self.cycles-pp.folio_wait_bit_common 1.18 +0.4 1.55 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 1.43 +0.5 1.90 perf-profile.self.cycles-pp.filemap_get_entry 3.93 +1.9 5.85 perf-profile.self.cycles-pp.do_access 11.80 ± 4% +1.9 13.74 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 6.55 +4.5 11.08 ± 2% perf-profile.self.cycles-pp.do_rw_once Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki