Hi Honza, On Thu, Feb 22, 2024 at 07:37:56PM +0100, Jan Kara wrote: > On Thu 22-02-24 12:50:32, Jan Kara wrote: > > On Thu 22-02-24 09:32:52, Oliver Sang wrote: > > > On Wed, Feb 21, 2024 at 12:14:25PM +0100, Jan Kara wrote: > > > > On Tue 20-02-24 16:25:37, kernel test robot wrote: > > > > > kernel test robot noticed a -21.4% regression of vm-scalability.throughput on: > > > > > > > > > > commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages") > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > testcase: vm-scalability > > > > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory > > > > > parameters: > > > > > > > > > > runtime: 300s > > > > > test: lru-file-readtwice > > > > > cpufreq_governor: performance > > > > > > > > JFYI I had a look into this. What the test seems to do is that it creates > > > > image files on tmpfs, loopmounts XFS there, and does reads over file on > > > > XFS. But I was not able to find what lru-file-readtwice exactly does, > > > > neither I was able to reproduce it because I got stuck on some missing Ruby > > > > dependencies on my test system yesterday. > > > > > > what's your OS? > > > > I have SLES15-SP4 installed in my VM. What was missing was 'git' rubygem > > which apparently is not packaged at all and when I manually installed it, I > > was still hitting other problems so I rather went ahead and checked the > > vm-scalability source and wrote my own reproducer based on that. > > > > I'm now able to reproduce the regression in my VM so I'm investigating... > > So I was experimenting with this. What the test does is it creates as many > files as there are CPUs, files are sized so that their total size is 8x the > amount of available RAM. For each file two tasks are started which > sequentially read the file from start to end. Trivial repro from my VM with > 8 CPUs and 64GB of RAM is like: > > truncate -s 60000000000 /dev/shm/xfsimg > mkfs.xfs /dev/shm/xfsimg > mount -t xfs -o loop /dev/shm/xfsimg /mnt > for (( i = 0; i < 8; i++ )); do truncate -s 60000000000 /mnt/sparse-file-$i; done > echo "Ready..." > sleep 3 > echo "Running..." > for (( i = 0; i < 8; i++ )); do > dd bs=4k if=/mnt/sparse-file-$i of=/dev/null & > dd bs=4k if=/mnt/sparse-file-$i of=/dev/null & > done 2>&1 | grep "copied" > wait > umount /mnt > > The difference between slow and fast runs seems to be in the amount of > pages reclaimed with direct reclaim - after commit ab4443fe3c we reclaim > about 10% of pages with direct reclaim, before commit ab4443fe3c only about > 1% of pages is reclaimed with direct reclaim. In both cases we reclaim the > same amount of pages corresponding to the total size of files so it isn't > the case that we would be rereading one page twice. > > I suspect the reclaim difference is because after commit ab4443fe3c we > trigger readahead somewhat earlier so our effective workingset is somewhat > larger. This apparently gives harder time to kswapd and we end up with > direct reclaim more often. > > Since this is a case of heavy overload on the system, I don't think the > throughput here matters that much and AFAICT the readahead code does > nothing wrong here. So I don't think we need to do anything here. Thanks a lot for the analysis. Seems we can abstract two factors that may affect the throughput: 1. The benchmark itself is "dd" from a file to null, which is basically a sequential operation, so the earlier readahead should bring benefit to the throughput. 2. The earlier readahead somewhat enlarges the workingset and causes more often direct memory reclaim, which may hurt the throughput. We did another round of test. Our machine has 512GB RAM, now we set the total file size to 256GB so that all the files can be fully loaded into the memory and there will be no reclaim anymore. This eliminates the impact of factor 2, but unexpectedly, we still see a -42.3% throughput regression after commit ab4443fe3c. >From the perf profile, we can see that the contention of folio lru lock becomes more intense. We also did a simple one-file "dd" test. Looks like it is more likely that low-order folios are allocated after commit ab4443fe3c (Fengwei will help provide the data soon). Therefore, the average folio size decreases while the total folio amount increases, which leads to touching lru lock more often. Please kindly check the detailed metrics below: ========================================================================================= tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/cpufreq_governor/debug-setup: lkp-spr-2sp4/vm-scalability/debian-11.1-x86_64-20220510.cgz/x86_64-rhel-8.3/gcc-12/300s/lru-file-readtwice/performance/256GB-perf commit: f0b7a0d1d466 ("Merge branch 'master' into mm-hotfixes-stable") ab4443fe3ca6 ("readahead: avoid multiple marked readahead pages") f0b7a0d1d46625db ab4443fe3ca6298663a55c4a70e ---------------- --------------------------- %stddev %change %stddev \ | \ 0.00 ± 49% -0.0 0.00 mpstat.cpu.all.iowait% 8.43 ± 2% +3.6 12.06 ± 5% mpstat.cpu.all.sys% 0.31 -0.0 0.27 ± 2% mpstat.cpu.all.usr% 2289863 ± 8% +55.6% 3563274 ± 7% numa-numastat.node0.local_node 2375395 ± 6% +54.4% 3666799 ± 6% numa-numastat.node0.numa_hit 2311189 ± 7% +53.8% 3554903 ± 6% numa-numastat.node1.local_node 2454386 ± 6% +50.1% 3684288 ± 4% numa-numastat.node1.numa_hit 300.98 +25.2% 376.84 ± 4% vmstat.memory.buff 46333305 +27.5% 59075372 ± 3% vmstat.memory.cache 25.22 ± 4% +51.4% 38.18 ± 6% vmstat.procs.r 303089 +8.0% 327220 vmstat.system.in 29.30 +13.5% 33.27 time.elapsed_time 29.30 +13.5% 33.27 time.elapsed_time.max 33780 ± 16% +94.4% 65660 ± 7% time.involuntary_context_switches 1943 ± 2% +42.4% 2767 ± 5% time.percent_of_cpu_this_job_got 554.77 ± 3% +63.5% 907.13 ± 6% time.system_time 14.90 -3.3% 14.40 time.user_time 20505 ± 11% -41.1% 12085 ± 8% time.voluntary_context_switches 284.00 ± 3% +34.8% 382.75 ± 5% turbostat.Avg_MHz 10.08 ± 2% +3.6 13.65 ± 4% turbostat.Busy% 39.50 ± 2% -1.8 37.68 turbostat.C1E% 0.38 ± 9% -17.3% 0.31 ± 14% turbostat.CPU%c6 9577640 +22.3% 11715251 ± 2% turbostat.IRQ 4.88 ± 12% -3.7 1.15 ± 48% turbostat.PKG_% 5558 ± 5% +41.9% 7887 ± 6% turbostat.POLL 790616 ± 6% -43.9% 443300 ± 7% vm-scalability.median 12060 ± 7% +3811.3 15871 ± 4% vm-scalability.stddev% 3.681e+08 ± 7% -42.3% 2.122e+08 ± 7% vm-scalability.throughput 33780 ± 16% +94.4% 65660 ± 7% vm-scalability.time.involuntary_context_switches 1943 ± 2% +42.4% 2767 ± 5% vm-scalability.time.percent_of_cpu_this_job_got 554.77 ± 3% +63.5% 907.13 ± 6% vm-scalability.time.system_time 20505 ± 11% -41.1% 12085 ± 8% vm-scalability.time.voluntary_context_switches 21390979 ± 4% +31.7% 28175360 ± 19% numa-meminfo.node0.Active 21388266 ± 4% +31.7% 28172516 ± 19% numa-meminfo.node0.Active(file) 24037883 ± 6% +31.1% 31516721 ± 17% numa-meminfo.node0.FilePages 497645 ± 25% +82.4% 907626 ± 38% numa-meminfo.node0.Inactive(file) 25952309 ± 6% +29.2% 33533454 ± 16% numa-meminfo.node0.MemUsed 20138 ± 9% +154.2% 51187 ± 11% numa-meminfo.node1.Active(anon) 704324 ± 17% +85.4% 1306147 ± 33% numa-meminfo.node1.Inactive 427031 ± 22% +141.7% 1031971 ± 41% numa-meminfo.node1.Inactive(file) 43712836 +27.4% 55698257 ± 2% meminfo.Active 22786 ± 6% +136.6% 53907 ± 11% meminfo.Active(anon) 43690049 +27.4% 55644350 ± 2% meminfo.Active(file) 47543418 +27.4% 60583554 ± 2% meminfo.Cached 1454581 ± 10% +72.8% 2513041 ± 11% meminfo.Inactive 929099 ± 16% +109.5% 1946433 ± 14% meminfo.Inactive(file) 242993 +12.9% 274324 meminfo.KReclaimable 79132 ± 2% +34.8% 106631 ± 2% meminfo.Mapped 51363725 +25.6% 64520957 ± 2% meminfo.Memused 9840 +12.2% 11041 ± 2% meminfo.PageTables 242993 +12.9% 274324 meminfo.SReclaimable 136679 +50.2% 205224 ± 5% meminfo.Shmem 72281513 ± 2% +25.8% 90925817 ± 2% meminfo.max_used_kB 5346609 ± 4% +31.7% 7042196 ± 19% numa-vmstat.node0.nr_active_file 6008637 ± 7% +31.1% 7878524 ± 17% numa-vmstat.node0.nr_file_pages 123918 ± 25% +83.2% 227064 ± 38% numa-vmstat.node0.nr_inactive_file 5346510 ± 4% +31.7% 7042147 ± 19% numa-vmstat.node0.nr_zone_active_file 123908 ± 25% +83.3% 227063 ± 38% numa-vmstat.node0.nr_zone_inactive_file 2375271 ± 6% +54.4% 3666818 ± 6% numa-vmstat.node0.numa_hit 2289740 ± 8% +55.6% 3563294 ± 7% numa-vmstat.node0.numa_local 5043 ± 9% +153.9% 12803 ± 11% numa-vmstat.node1.nr_active_anon 106576 ± 22% +141.7% 257597 ± 41% numa-vmstat.node1.nr_inactive_file 5043 ± 9% +153.9% 12803 ± 11% numa-vmstat.node1.nr_zone_active_anon 106574 ± 22% +141.7% 257604 ± 41% numa-vmstat.node1.nr_zone_inactive_file 2454493 ± 6% +50.1% 3684201 ± 4% numa-vmstat.node1.numa_hit 2311296 ± 7% +53.8% 3554816 ± 6% numa-vmstat.node1.numa_local 5701 ± 6% +136.5% 13486 ± 11% proc-vmstat.nr_active_anon 10923519 +27.3% 13904109 ± 2% proc-vmstat.nr_active_file 11886157 +27.4% 15138396 ± 2% proc-vmstat.nr_file_pages 1.19e+08 -2.8% 1.157e+08 proc-vmstat.nr_free_pages 131227 +8.1% 141868 proc-vmstat.nr_inactive_anon 231610 ± 16% +109.7% 485756 ± 14% proc-vmstat.nr_inactive_file 19793 ± 2% +34.7% 26668 ± 2% proc-vmstat.nr_mapped 2455 +12.3% 2758 ± 2% proc-vmstat.nr_page_table_pages 34038 ± 2% +51.4% 51526 ± 5% proc-vmstat.nr_shmem 60753 +12.9% 68588 proc-vmstat.nr_slab_reclaimable 113209 +5.9% 119837 proc-vmstat.nr_slab_unreclaimable 5701 ± 6% +136.5% 13486 ± 11% proc-vmstat.nr_zone_active_anon 10923517 +27.3% 13904109 ± 2% proc-vmstat.nr_zone_active_file 131227 +8.1% 141868 proc-vmstat.nr_zone_inactive_anon 231612 ± 16% +109.7% 485757 ± 14% proc-vmstat.nr_zone_inactive_file 162.75 ± 79% +552.8% 1062 ± 72% proc-vmstat.numa_hint_faults 4831171 ± 4% +52.2% 7352661 ± 4% proc-vmstat.numa_hit 4602441 ± 5% +54.7% 7119707 ± 4% proc-vmstat.numa_local 128.75 ± 59% +527.5% 807.88 ± 31% proc-vmstat.numa_pages_migrated 69656618 -1.5% 68615309 proc-vmstat.pgalloc_normal 672926 +3.0% 692907 proc-vmstat.pgfault 128.75 ± 59% +527.5% 807.88 ± 31% proc-vmstat.pgmigrate_success 31089 +3.7% 32235 proc-vmstat.pgreuse 0.77 ± 2% -0.0 0.74 ± 2% perf-stat.i.branch-miss-rate% 23.58 ± 6% +3.6 27.18 ± 4% perf-stat.i.cache-miss-rate% 2.74 +6.0% 2.90 perf-stat.i.cpi 5.887e+10 ± 7% +28.6% 7.572e+10 ± 10% perf-stat.i.cpu-cycles 10194 ± 3% -9.5% 9226 ± 4% perf-stat.i.cycles-between-cache-misses 0.44 -2.7% 0.43 perf-stat.i.ipc 0.25 ± 11% +29.9% 0.32 ± 11% perf-stat.i.metric.GHz 17995 ± 2% -9.0% 16374 ± 3% perf-stat.i.minor-faults 17995 ± 2% -9.0% 16374 ± 3% perf-stat.i.page-faults 17.09 -16.1% 14.34 ± 2% perf-stat.overall.MPKI 0.32 -0.0 0.29 perf-stat.overall.branch-miss-rate% 82.93 -2.1 80.88 perf-stat.overall.cache-miss-rate% 3.55 ± 2% +28.9% 4.58 ± 3% perf-stat.overall.cpi 207.81 ± 2% +53.7% 319.49 ± 5% perf-stat.overall.cycles-between-cache-misses 0.01 ± 4% +0.0 0.01 ± 3% perf-stat.overall.dTLB-load-miss-rate% 0.01 ± 3% +0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate% 0.28 ± 2% -22.3% 0.22 ± 3% perf-stat.overall.ipc 967.32 +21.5% 1175 ± 2% perf-stat.overall.path-length 3.648e+09 +9.0% 3.976e+09 perf-stat.ps.branch-instructions 2.987e+08 -10.6% 2.67e+08 perf-stat.ps.cache-misses 3.602e+08 -8.4% 3.301e+08 perf-stat.ps.cache-references 6.207e+10 ± 2% +37.3% 8.524e+10 ± 4% perf-stat.ps.cpu-cycles 356765 ± 4% +14.6% 408833 ± 4% perf-stat.ps.dTLB-load-misses 4.786e+09 +5.2% 5.034e+09 perf-stat.ps.dTLB-loads 222451 ± 2% +6.7% 237255 ± 2% perf-stat.ps.dTLB-store-misses 2.207e+09 -7.4% 2.043e+09 perf-stat.ps.dTLB-stores 1.748e+10 +6.5% 1.862e+10 perf-stat.ps.instructions 17777 -9.3% 16117 ± 2% perf-stat.ps.minor-faults 17778 -9.3% 16118 ± 2% perf-stat.ps.page-faults 5.193e+11 +21.5% 6.31e+11 ± 2% perf-stat.total.instructions 12.70 -7.9 4.85 ± 38% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read 12.53 -7.8 4.76 ± 38% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter 8.68 -5.2 3.46 ± 38% perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read 8.13 -4.7 3.38 ± 9% perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order 8.67 -4.7 3.93 ± 8% perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read 8.51 -4.7 3.81 ± 8% perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages 7.84 -4.6 3.28 ± 8% perf-profile.calltrace.cycles-pp.__memset.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages 6.47 ± 2% -2.1 4.39 ± 5% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 6.44 ± 2% -2.1 4.36 ± 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 6.44 ± 2% -2.1 4.36 ± 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 6.43 ± 2% -2.1 4.36 ± 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 6.39 ± 2% -2.1 4.33 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 6.08 ± 2% -2.0 4.11 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 5.85 ± 2% -1.9 3.96 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 3.96 ± 2% -1.3 2.62 ± 6% perf-profile.calltrace.cycles-pp.write 3.50 ± 2% -1.1 2.36 ± 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 3.28 ± 2% -1.1 2.22 ± 6% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 2.76 ± 3% -0.9 1.86 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 2.63 ± 3% -0.8 1.79 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.37 ± 2% -0.8 1.57 ± 6% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 2.30 ± 2% -0.8 1.52 ± 6% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state 2.34 ± 4% -0.7 1.61 ± 7% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.91 ± 3% -0.7 1.26 ± 7% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 2.04 ± 4% -0.6 1.43 ± 7% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.32 ± 2% -0.6 0.71 ± 38% perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter 1.68 ± 4% -0.6 1.09 ± 8% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt 1.48 ± 3% -0.5 0.98 ± 6% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt 1.47 ± 3% -0.5 0.98 ± 6% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt 1.29 ± 3% -0.4 0.87 ± 5% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit 1.43 ± 11% -0.4 1.04 ± 38% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm 1.13 ± 3% -0.4 0.76 ± 6% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 1.02 -0.3 0.70 ± 5% perf-profile.calltrace.cycles-pp.intel_idle_xstate.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 0.94 ± 3% -0.3 0.65 ± 5% perf-profile.calltrace.cycles-pp.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler 0.92 ± 3% -0.3 0.63 ± 5% perf-profile.calltrace.cycles-pp.perf_adjust_freq_unthr_context.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle 1.58 ± 12% -0.3 1.29 ± 3% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.74 ± 4% -0.3 0.46 ± 38% perf-profile.calltrace.cycles-pp.__filemap_add_folio.filemap_add_folio.page_cache_ra_order.filemap_get_pages.filemap_read 1.56 ± 12% -0.3 1.29 ± 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.55 ± 12% -0.3 1.28 ± 3% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork 1.55 ± 12% -0.3 1.28 ± 3% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread 1.65 ± 11% -0.3 1.38 ± 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm 1.65 ± 11% -0.3 1.38 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm 1.65 ± 11% -0.3 1.38 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work 1.47 ± 11% -0.2 1.22 ± 3% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread 1.03 ± 7% -0.2 0.82 ± 8% perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.vfs_write.ksys_write.do_syscall_64 1.03 ± 7% -0.2 0.82 ± 8% perf-profile.calltrace.cycles-pp.devkmsg_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.03 ± 7% -0.2 0.82 ± 8% perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write.ksys_write 1.02 ± 7% -0.2 0.82 ± 8% perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write 1.02 ± 7% -0.2 0.82 ± 8% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write 0.61 ± 5% -0.1 0.55 ± 4% perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64 86.32 +4.1 90.42 perf-profile.calltrace.cycles-pp.read 85.08 +4.6 89.66 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read 84.96 +4.6 89.58 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 84.63 +4.7 89.38 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 84.36 +4.9 89.21 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 26.79 +9.3 36.06 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_activate 26.94 +9.3 36.22 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_activate.folio_mark_accessed.filemap_read 26.87 +9.3 36.17 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_activate.folio_mark_accessed 26.91 +10.9 37.78 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru 27.00 +10.9 37.89 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.filemap_add_folio.page_cache_ra_order 26.99 +10.9 37.89 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.filemap_add_folio 27.44 +10.9 38.36 ± 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.filemap_add_folio.page_cache_ra_order.filemap_get_pages 27.47 +10.9 38.39 ± 2% perf-profile.calltrace.cycles-pp.folio_add_lru.filemap_add_folio.page_cache_ra_order.filemap_get_pages.filemap_read 12.72 -7.2 5.56 ± 7% perf-profile.children.cycles-pp.copy_page_to_iter 12.56 -7.1 5.46 ± 7% perf-profile.children.cycles-pp._copy_to_iter 8.80 -4.9 3.95 ± 8% perf-profile.children.cycles-pp.read_pages 8.78 -4.8 3.94 ± 8% perf-profile.children.cycles-pp.iomap_readahead 8.62 -4.8 3.83 ± 8% perf-profile.children.cycles-pp.iomap_readpage_iter 8.15 -4.8 3.39 ± 9% perf-profile.children.cycles-pp.zero_user_segments 8.07 -4.7 3.36 ± 9% perf-profile.children.cycles-pp.__memset 6.47 ± 2% -2.1 4.39 ± 5% perf-profile.children.cycles-pp.cpu_startup_entry 6.47 ± 2% -2.1 4.39 ± 5% perf-profile.children.cycles-pp.do_idle 6.47 ± 2% -2.1 4.39 ± 5% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 6.44 ± 2% -2.1 4.36 ± 5% perf-profile.children.cycles-pp.start_secondary 6.42 ± 2% -2.1 4.36 ± 5% perf-profile.children.cycles-pp.cpuidle_idle_call 6.10 ± 2% -2.0 4.13 ± 5% perf-profile.children.cycles-pp.cpuidle_enter 6.10 ± 2% -2.0 4.13 ± 5% perf-profile.children.cycles-pp.cpuidle_enter_state 4.53 ± 2% -1.5 2.98 ± 6% perf-profile.children.cycles-pp.write 4.34 ± 2% -1.4 2.90 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 3.93 ± 2% -1.3 2.66 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 2.96 ± 2% -1.0 1.99 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 2.89 ± 2% -1.0 1.94 ± 5% perf-profile.children.cycles-pp.hrtimer_interrupt 2.46 ± 3% -0.8 1.64 ± 6% perf-profile.children.cycles-pp.__hrtimer_run_queues 2.47 ± 3% -0.8 1.72 ± 7% perf-profile.children.cycles-pp.ksys_write 2.18 ± 3% -0.7 1.43 ± 7% perf-profile.children.cycles-pp.tick_nohz_highres_handler 1.96 ± 2% -0.7 1.31 ± 5% perf-profile.children.cycles-pp.tick_sched_handle 1.96 ± 2% -0.7 1.30 ± 5% perf-profile.children.cycles-pp.update_process_times 2.20 ± 3% -0.6 1.55 ± 7% perf-profile.children.cycles-pp.vfs_write 1.74 ± 2% -0.6 1.17 ± 5% perf-profile.children.cycles-pp.scheduler_tick 1.35 ± 2% -0.5 0.82 ± 5% perf-profile.children.cycles-pp.filemap_get_read_batch 1.38 ± 2% -0.5 0.88 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.48 ± 17% -0.4 0.05 ± 42% perf-profile.children.cycles-pp.page_cache_ra_unbounded 0.69 ± 7% -0.4 0.27 ± 39% perf-profile.children.cycles-pp.xfs_ilock 0.82 ± 4% -0.4 0.40 ± 7% perf-profile.children.cycles-pp.touch_atime 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update 1.46 ± 11% -0.4 1.07 ± 38% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail 0.77 ± 5% -0.4 0.38 ± 7% perf-profile.children.cycles-pp.atime_needs_update 1.22 ± 2% -0.4 0.84 ± 5% perf-profile.children.cycles-pp.perf_event_task_tick 1.21 ± 2% -0.4 0.83 ± 5% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context 1.14 ± 3% -0.4 0.76 ± 6% perf-profile.children.cycles-pp.intel_idle 0.65 ± 8% -0.4 0.28 ± 9% perf-profile.children.cycles-pp.down_read 1.02 ± 2% -0.3 0.70 ± 5% perf-profile.children.cycles-pp.intel_idle_xstate 0.79 ± 2% -0.3 0.49 ± 5% perf-profile.children.cycles-pp.rw_verify_area 1.58 ± 12% -0.3 1.29 ± 3% perf-profile.children.cycles-pp.worker_thread 1.56 ± 12% -0.3 1.29 ± 4% perf-profile.children.cycles-pp.process_one_work 1.55 ± 12% -0.3 1.28 ± 3% perf-profile.children.cycles-pp.drm_fb_helper_damage_work 1.55 ± 12% -0.3 1.28 ± 3% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty 1.65 ± 11% -0.3 1.38 ± 3% perf-profile.children.cycles-pp.ret_from_fork_asm 1.65 ± 11% -0.3 1.38 ± 3% perf-profile.children.cycles-pp.ret_from_fork 1.65 ± 11% -0.3 1.38 ± 3% perf-profile.children.cycles-pp.kthread 0.68 ± 2% -0.3 0.43 ± 7% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.drm_fb_memcpy 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.memcpy_toio 0.77 ± 4% -0.2 0.52 ± 4% perf-profile.children.cycles-pp.__filemap_add_folio 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.commit_tail 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.drm_atomic_commit 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.drm_atomic_helper_commit 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes 1.46 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm 1.47 ± 11% -0.2 1.22 ± 3% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb 0.62 ± 3% -0.2 0.39 ± 5% perf-profile.children.cycles-pp.xas_load 0.61 ± 2% -0.2 0.37 ± 5% perf-profile.children.cycles-pp.security_file_permission 0.76 ± 3% -0.2 0.53 ± 5% perf-profile.children.cycles-pp.__intel_pmu_enable_all 0.41 ± 5% -0.2 0.20 ± 38% perf-profile.children.cycles-pp.xfs_iunlock 1.03 ± 7% -0.2 0.82 ± 8% perf-profile.children.cycles-pp.devkmsg_emit 1.03 ± 7% -0.2 0.82 ± 8% perf-profile.children.cycles-pp.devkmsg_write 1.03 ± 7% -0.2 0.83 ± 8% perf-profile.children.cycles-pp.console_flush_all 1.03 ± 7% -0.2 0.83 ± 8% perf-profile.children.cycles-pp.console_unlock 1.04 ± 7% -0.2 0.84 ± 8% perf-profile.children.cycles-pp.vprintk_emit 0.62 ± 3% -0.2 0.42 ± 6% perf-profile.children.cycles-pp.irq_exit_rcu 0.60 ± 2% -0.2 0.41 ± 5% perf-profile.children.cycles-pp.__do_softirq 0.52 ± 3% -0.2 0.33 ± 7% perf-profile.children.cycles-pp.folio_alloc 0.45 ± 2% -0.2 0.27 ± 5% perf-profile.children.cycles-pp.apparmor_file_permission 0.33 ± 6% -0.2 0.16 ± 5% perf-profile.children.cycles-pp.up_read 0.38 ± 4% -0.1 0.23 ± 8% perf-profile.children.cycles-pp.__fsnotify_parent 0.45 ± 3% -0.1 0.31 ± 6% perf-profile.children.cycles-pp.rebalance_domains 0.34 ± 3% -0.1 0.20 ± 6% perf-profile.children.cycles-pp.__fdget_pos 0.40 ± 4% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.__alloc_pages 0.33 ± 3% -0.1 0.19 ± 6% perf-profile.children.cycles-pp.xas_descend 0.41 ± 3% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.alloc_pages_mpol 0.29 ± 6% -0.1 0.16 ± 8% perf-profile.children.cycles-pp.__mem_cgroup_charge 0.38 ± 3% -0.1 0.25 ± 7% perf-profile.children.cycles-pp.get_page_from_freelist 0.34 ± 2% -0.1 0.21 ± 6% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.22 ± 7% -0.1 0.10 ± 12% perf-profile.children.cycles-pp.try_charge_memcg 0.25 ± 3% -0.1 0.14 ± 5% perf-profile.children.cycles-pp.xas_store 0.31 ± 3% -0.1 0.22 ± 6% perf-profile.children.cycles-pp._raw_spin_trylock 0.20 ± 4% -0.1 0.11 ± 7% perf-profile.children.cycles-pp.__free_pages_ok 0.23 ± 5% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.rmqueue 0.22 ± 4% -0.1 0.13 ± 8% perf-profile.children.cycles-pp.current_time 0.18 ± 6% -0.1 0.10 ± 7% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.38 ± 15% -0.1 0.29 ± 11% perf-profile.children.cycles-pp.ktime_get 0.16 ± 8% -0.1 0.08 ± 9% perf-profile.children.cycles-pp.page_counter_try_charge 0.25 ± 6% -0.1 0.17 ± 9% perf-profile.children.cycles-pp.ktime_get_update_offsets_now 0.18 ± 3% -0.1 0.10 ± 6% perf-profile.children.cycles-pp.__x64_sys_execve 0.18 ± 3% -0.1 0.10 ± 6% perf-profile.children.cycles-pp.do_execveat_common 0.18 ± 3% -0.1 0.10 ± 6% perf-profile.children.cycles-pp.execve 0.28 ± 21% -0.1 0.20 ± 13% perf-profile.children.cycles-pp.tick_irq_enter 0.17 ± 4% -0.1 0.10 ± 5% perf-profile.children.cycles-pp.__mmput 0.17 ± 4% -0.1 0.10 ± 5% perf-profile.children.cycles-pp.exit_mmap 0.25 -0.1 0.18 ± 7% perf-profile.children.cycles-pp.menu_select 0.18 ± 3% -0.1 0.11 ± 6% perf-profile.children.cycles-pp.aa_file_perm 0.28 ± 20% -0.1 0.21 ± 14% perf-profile.children.cycles-pp.irq_enter_rcu 0.13 ± 4% -0.1 0.06 ± 8% perf-profile.children.cycles-pp.xas_create 0.20 ± 4% -0.1 0.13 ± 7% perf-profile.children.cycles-pp.__mod_node_page_state 0.21 ± 4% -0.1 0.14 ± 5% perf-profile.children.cycles-pp.load_balance 0.20 ± 6% -0.1 0.14 ± 3% perf-profile.children.cycles-pp.xas_start 0.21 ± 4% -0.1 0.14 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_state 0.11 ± 3% -0.1 0.04 ± 38% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 0.11 ± 4% -0.1 0.04 ± 38% perf-profile.children.cycles-pp.xas_alloc 0.12 ± 2% -0.1 0.06 ± 8% perf-profile.children.cycles-pp.folio_prep_large_rmappable 0.18 ± 4% -0.1 0.12 ± 6% perf-profile.children.cycles-pp.__cond_resched 0.15 ± 5% -0.1 0.09 ± 4% perf-profile.children.cycles-pp.bprm_execve 0.61 ± 5% -0.1 0.55 ± 4% perf-profile.children.cycles-pp.truncate_inode_pages_range 0.13 ± 5% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.exec_binprm 0.13 ± 5% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.load_elf_binary 0.13 ± 5% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.search_binary_handler 0.08 ± 6% -0.1 0.02 ±100% perf-profile.children.cycles-pp.begin_new_exec 0.14 ± 11% -0.1 0.08 ± 8% perf-profile.children.cycles-pp.arch_scale_freq_tick 0.12 ± 4% -0.1 0.07 ± 7% perf-profile.children.cycles-pp.lru_add_drain 0.10 ± 5% -0.1 0.04 ± 37% perf-profile.children.cycles-pp.__xas_next 0.12 ± 5% -0.1 0.06 ± 7% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.15 ± 6% -0.1 0.10 ± 5% perf-profile.children.cycles-pp.update_sd_lb_stats 0.12 ± 2% -0.1 0.07 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault 0.15 ± 4% -0.1 0.10 ± 7% perf-profile.children.cycles-pp.find_busiest_group 0.31 ± 5% -0.0 0.26 ± 6% perf-profile.children.cycles-pp.workingset_activation 0.12 ± 4% -0.0 0.07 ± 4% perf-profile.children.cycles-pp.do_exit 0.12 ± 3% -0.0 0.07 ± 4% perf-profile.children.cycles-pp.__x64_sys_exit_group 0.12 ± 3% -0.0 0.07 ± 4% perf-profile.children.cycles-pp.do_group_exit 0.13 ± 5% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.update_sg_lb_stats 0.10 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.do_vmi_munmap 0.10 ± 4% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.do_vmi_align_munmap 0.10 ± 5% -0.0 0.05 ± 38% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 0.15 ± 4% -0.0 0.10 ± 9% perf-profile.children.cycles-pp._raw_spin_lock 0.35 ± 2% -0.0 0.30 ± 3% perf-profile.children.cycles-pp.folio_activate_fn 0.11 ± 6% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.__schedule 0.11 ± 4% -0.0 0.06 ± 10% perf-profile.children.cycles-pp.do_user_addr_fault 0.11 ± 4% -0.0 0.06 ± 10% perf-profile.children.cycles-pp.exc_page_fault 0.08 ± 4% -0.0 0.04 ± 57% perf-profile.children.cycles-pp.unmap_region 0.10 ± 4% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.exit_mm 0.10 ± 3% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.handle_mm_fault 0.15 ± 5% -0.0 0.11 ± 14% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.15 ± 4% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.native_irq_return_iret 0.10 ± 3% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 0.10 ± 6% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.tlb_batch_pages_flush 0.10 ± 5% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.vm_mmap_pgoff 0.10 ± 4% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.mmap_region 0.15 ± 6% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 0.08 ± 4% -0.0 0.04 ± 57% perf-profile.children.cycles-pp.rcu_core 0.10 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.tlb_finish_mmu 0.10 ± 4% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.do_mmap 0.10 ± 5% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.__handle_mm_fault 0.16 ± 14% -0.0 0.12 ± 23% perf-profile.children.cycles-pp.vt_console_print 0.15 ± 13% -0.0 0.12 ± 23% perf-profile.children.cycles-pp.con_scroll 0.15 ± 13% -0.0 0.11 ± 24% perf-profile.children.cycles-pp.fbcon_redraw 0.15 ± 13% -0.0 0.12 ± 23% perf-profile.children.cycles-pp.fbcon_scroll 0.15 ± 13% -0.0 0.12 ± 23% perf-profile.children.cycles-pp.lf 0.11 ± 5% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.task_tick_fair 0.08 ± 17% -0.0 0.05 ± 38% perf-profile.children.cycles-pp.calc_global_load_tick 0.11 ± 4% -0.0 0.07 ± 4% perf-profile.children.cycles-pp.perf_rotate_context 0.09 ± 6% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.schedule 0.14 ± 13% -0.0 0.10 ± 24% perf-profile.children.cycles-pp.fbcon_putcs 0.10 ± 5% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.rcu_all_qs 0.07 ± 7% -0.0 0.03 ± 77% perf-profile.children.cycles-pp.sched_clock 0.10 ± 18% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.__memcpy 0.08 ± 6% -0.0 0.04 ± 37% perf-profile.children.cycles-pp.asm_sysvec_call_function 0.11 ± 4% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.clockevents_program_event 0.11 ± 16% -0.0 0.08 ± 25% perf-profile.children.cycles-pp.fast_imageblit 0.11 ± 16% -0.0 0.08 ± 25% perf-profile.children.cycles-pp.drm_fbdev_generic_defio_imageblit 0.11 ± 16% -0.0 0.08 ± 25% perf-profile.children.cycles-pp.sys_imageblit 0.12 ± 5% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.find_lock_entries 0.09 ± 4% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.native_sched_clock 0.08 ± 6% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.sched_clock_cpu 0.08 ± 5% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.lapic_next_deadline 0.09 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.read_tsc 0.07 ± 6% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.native_apic_msr_eoi 0.07 ± 8% -0.0 0.05 ± 6% perf-profile.children.cycles-pp.__free_one_page 0.07 ± 9% +0.0 0.09 perf-profile.children.cycles-pp.__mem_cgroup_uncharge 0.06 ± 7% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.uncharge_batch 0.04 ± 58% +0.0 0.07 perf-profile.children.cycles-pp.page_counter_uncharge 0.09 ± 7% +0.0 0.12 ± 2% perf-profile.children.cycles-pp.destroy_large_folio 0.08 ± 4% +0.0 0.13 ± 13% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.free_unref_page 89.22 +3.4 92.57 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 89.02 +3.4 92.44 perf-profile.children.cycles-pp.do_syscall_64 86.89 +3.9 90.79 perf-profile.children.cycles-pp.read 39.51 +4.7 44.21 perf-profile.children.cycles-pp.filemap_get_pages 84.67 +4.7 89.40 perf-profile.children.cycles-pp.ksys_read 84.40 +4.8 89.24 perf-profile.children.cycles-pp.vfs_read 37.48 +5.7 43.21 perf-profile.children.cycles-pp.page_cache_ra_order 82.04 +5.9 87.98 perf-profile.children.cycles-pp.filemap_read 28.01 +9.2 37.19 perf-profile.children.cycles-pp.folio_mark_accessed 27.68 +9.2 36.91 ± 2% perf-profile.children.cycles-pp.folio_activate 28.55 +10.4 38.96 ± 2% perf-profile.children.cycles-pp.filemap_add_folio 27.81 +10.7 38.48 ± 2% perf-profile.children.cycles-pp.folio_add_lru 54.31 +19.8 74.12 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 54.49 +19.8 74.33 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 54.82 +19.8 74.67 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 55.58 +19.9 75.44 ± 2% perf-profile.children.cycles-pp.folio_batch_move_lru 12.46 -7.0 5.42 ± 7% perf-profile.self.cycles-pp._copy_to_iter 8.02 -4.7 3.34 ± 9% perf-profile.self.cycles-pp.__memset 1.14 ± 3% -0.4 0.76 ± 6% perf-profile.self.cycles-pp.intel_idle 0.93 ± 3% -0.3 0.58 ± 6% perf-profile.self.cycles-pp.filemap_read 0.56 ± 8% -0.3 0.23 ± 9% perf-profile.self.cycles-pp.down_read 1.02 -0.3 0.70 ± 5% perf-profile.self.cycles-pp.intel_idle_xstate 0.49 ± 7% -0.3 0.21 ± 8% perf-profile.self.cycles-pp.atime_needs_update 0.66 ± 2% -0.2 0.42 ± 7% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 1.43 ± 11% -0.2 1.18 ± 3% perf-profile.self.cycles-pp.memcpy_toio 0.66 ± 2% -0.2 0.44 ± 5% perf-profile.self.cycles-pp.filemap_get_read_batch 0.76 ± 3% -0.2 0.53 ± 5% perf-profile.self.cycles-pp.__intel_pmu_enable_all 0.60 ± 3% -0.2 0.38 ± 6% perf-profile.self.cycles-pp.write 0.60 ± 2% -0.2 0.38 ± 6% perf-profile.self.cycles-pp.read 0.53 ± 3% -0.2 0.32 ± 6% perf-profile.self.cycles-pp.vfs_read 0.48 -0.2 0.31 ± 4% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context 0.32 ± 7% -0.2 0.16 ± 6% perf-profile.self.cycles-pp.up_read 0.36 ± 4% -0.1 0.22 ± 7% perf-profile.self.cycles-pp.__fsnotify_parent 0.32 ± 4% -0.1 0.19 ± 6% perf-profile.self.cycles-pp.__fdget_pos 0.30 ± 7% -0.1 0.17 ± 9% perf-profile.self.cycles-pp.vfs_write 0.30 ± 3% -0.1 0.17 ± 6% perf-profile.self.cycles-pp.xas_descend 0.28 ± 2% -0.1 0.16 ± 8% perf-profile.self.cycles-pp.do_syscall_64 0.28 ± 3% -0.1 0.17 ± 6% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.32 ± 3% -0.1 0.22 ± 7% perf-profile.self.cycles-pp.cpuidle_enter_state 0.19 ± 8% -0.1 0.09 ± 38% perf-profile.self.cycles-pp.xfs_file_read_iter 0.24 ± 3% -0.1 0.14 ± 6% perf-profile.self.cycles-pp.apparmor_file_permission 0.31 ± 3% -0.1 0.22 ± 6% perf-profile.self.cycles-pp._raw_spin_trylock 0.18 ± 5% -0.1 0.10 ± 8% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.22 ± 4% -0.1 0.14 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.23 ± 6% -0.1 0.16 ± 9% perf-profile.self.cycles-pp.ktime_get_update_offsets_now 0.14 ± 9% -0.1 0.07 ± 12% perf-profile.self.cycles-pp.page_counter_try_charge 0.10 ± 4% -0.1 0.02 ±100% perf-profile.self.cycles-pp.rmqueue 0.20 ± 3% -0.1 0.13 ± 7% perf-profile.self.cycles-pp.__mod_node_page_state 0.18 ± 3% -0.1 0.12 ± 7% perf-profile.self.cycles-pp.xas_load 0.09 -0.1 0.02 ±100% perf-profile.self.cycles-pp.__xas_next 0.17 ± 2% -0.1 0.10 ± 6% perf-profile.self.cycles-pp.rw_verify_area 0.16 ± 3% -0.1 0.10 ± 6% perf-profile.self.cycles-pp.aa_file_perm 0.16 ± 3% -0.1 0.09 ± 9% perf-profile.self.cycles-pp.filemap_get_pages 0.19 ± 6% -0.1 0.13 ± 3% perf-profile.self.cycles-pp.xas_start 0.18 ± 3% -0.1 0.12 ± 5% perf-profile.self.cycles-pp.security_file_permission 0.17 -0.1 0.11 ± 6% perf-profile.self.cycles-pp.copy_page_to_iter 0.12 ± 2% -0.1 0.06 ± 8% perf-profile.self.cycles-pp.folio_prep_large_rmappable 0.16 ± 3% -0.1 0.10 ± 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.14 ± 11% -0.1 0.08 ± 8% perf-profile.self.cycles-pp.arch_scale_freq_tick 0.08 ± 8% -0.1 0.03 ± 77% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.12 ± 6% -0.1 0.06 ± 10% perf-profile.self.cycles-pp.__free_pages_ok 0.08 ± 4% -0.0 0.03 ± 77% perf-profile.self.cycles-pp.xfs_ilock 0.14 ± 4% -0.0 0.09 ± 9% perf-profile.self.cycles-pp._raw_spin_lock 0.10 ± 4% -0.0 0.06 ± 39% perf-profile.self.cycles-pp.xfs_iunlock 0.12 ± 4% -0.0 0.08 ± 9% perf-profile.self.cycles-pp.current_time 0.11 ± 18% -0.0 0.06 ± 17% perf-profile.self.cycles-pp.iomap_set_range_uptodate 0.12 ± 4% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.ksys_write 0.15 ± 4% -0.0 0.11 ± 6% perf-profile.self.cycles-pp.native_irq_return_iret 0.10 ± 3% -0.0 0.06 ± 8% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.09 ± 5% -0.0 0.05 ± 38% perf-profile.self.cycles-pp.xfs_file_buffered_read 0.10 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.xas_store 0.14 ± 3% -0.0 0.10 ± 15% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.08 ± 17% -0.0 0.04 ± 38% perf-profile.self.cycles-pp.calc_global_load_tick 0.11 ± 4% -0.0 0.07 ± 10% perf-profile.self.cycles-pp.ksys_read 0.10 ± 18% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.__memcpy 0.10 ± 7% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.update_sg_lb_stats 0.12 ± 4% -0.0 0.08 ± 8% perf-profile.self.cycles-pp.menu_select 0.09 ± 4% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.__cond_resched 0.11 ± 16% -0.0 0.08 ± 25% perf-profile.self.cycles-pp.fast_imageblit 0.09 ± 4% -0.0 0.06 ± 5% perf-profile.self.cycles-pp.read_tsc 0.08 ± 5% -0.0 0.06 ± 5% perf-profile.self.cycles-pp.native_sched_clock 0.08 ± 5% -0.0 0.06 ± 5% perf-profile.self.cycles-pp.lapic_next_deadline 0.08 ± 6% -0.0 0.06 ± 11% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave 0.07 ± 5% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.native_apic_msr_eoi 0.07 ± 4% -0.0 0.05 ± 6% perf-profile.self.cycles-pp.__free_one_page 0.09 ± 8% -0.0 0.07 ± 4% perf-profile.self.cycles-pp.find_lock_entries 0.09 +0.0 0.10 perf-profile.self.cycles-pp.lru_add_fn 0.14 ± 2% +0.0 0.16 perf-profile.self.cycles-pp.folio_batch_move_lru 0.03 ± 77% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.page_counter_uncharge 54.31 +19.8 74.12 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath Best Regards, Yujie