On 9/12/2023 7:54 PM, kernel test robot wrote:
hi, Raghu, hope this third performance report for same one patch-set won't annoy you, and better, have some value to you.
Not at all. But thanks a lot and am rather more happy to see this exhaustive results. Because: It is easy to show see that patchset is increasing readability of code or maintainance of code etc., while I try my best to see regressions are within noise level for some corner cases and some benchmarks have improved noticeably, there is always a room to miss something. Reports like this, helps to boost confidence on patchset. Also your cumulative (bisection) report also helped to evaluate importance of each patch too..
we won't send more autonuma-benchmark performance improvement reports for this patch-set, of course, unless you still hope we do. BTW, we will still send out performance/function regression reports if any. as in previous reports, we know that you want to see the performance impact of whole patch set, so let me give a full summary here: let me list how we apply your patch set again: 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned <-- we reported [1] 167773d1ddb5f sched/numa: Increase tasks' access history <---- for this report fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic <--- we reported [2] 2a806eab1c2e1 sched/numa: Move up the access pid reset logic 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well [1] https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@xxxxxxxxx/ [2] https://lore.kernel.org/all/202309121417.53f44ad6-oliver.sang@xxxxxxxxx/ below will only give out the comparison between 2f88c8e802c8b and 68cfe9439a1ba in a summary way, if you want detail data for more commits, or more comparison data, please let me know. Thanks! on test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark 2f88c8e802c8b128 68cfe9439a1baa642e05883fa64 ---------------- --------------------------- %stddev %change %stddev \ | \ 271.01 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds 76.28 -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 8.11 -0.1% 8.10 autonuma-benchmark.numa02.seconds 1425 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time 1425 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time.max on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark 2f88c8e802c8b128 68cfe9439a1baa642e05883fa64 ---------------- --------------------------- %stddev %change %stddev \ | \ 361.53 ± 6% -10.4% 323.83 ± 3% autonuma-benchmark.numa01.seconds 255.31 -60.1% 101.90 ± 2% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 14.95 -4.6% 14.26 autonuma-benchmark.numa02.seconds 2530 ± 3% -30.3% 1763 ± 2% autonuma-benchmark.time.elapsed_time 2530 ± 3% -30.3% 1763 ± 2% autonuma-benchmark.time.elapsed_time.max
This gives me fair confidence that we are able to get a decent improvement overall.
below is the auto-generated report part, FYI. Hello, kernel test robot noticed a -17.6% improvement of autonuma-benchmark.numa01.seconds on: commit: 167773d1ddb5ffdd944f851f2cbdd4e65425a358 ("[RFC PATCH V1 4/6] sched/numa: Increase tasks' access history") url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805 patch link: https://lore.kernel.org/all/cf200aaf594caae68350219fa1f781d64136fa2c.1693287931.git.raghavendra.kt@xxxxxxx/ patch subject: [RFC PATCH V1 4/6] sched/numa: Increase tasks' access history testcase: autonuma-benchmark test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory parameters: iterations: 4x test: numa01_THREAD_ALLOC cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -15.4% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | iterations=4x | | | test=numa01_THREAD_ALLOC | +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -14.8% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | iterations=4x | | | test=_INVERSE_BIND | +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | autonuma-benchmark: autonuma-benchmark.numa01_THREAD_ALLOC.seconds -10.7% improvement | | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | iterations=4x | | | test=numa01_THREAD_ALLOC | +------------------+----------------------------------------------------------------------------------------------------+
Will go through this too.
Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230912/202309122114.b9e08a43-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 105.67 ± 8% -20.3% 84.17 ± 10% perf-c2c.HITM.remote 1.856e+10 ± 7% -18.8% 1.508e+10 ± 8% cpuidle..time 19025348 ± 7% -18.6% 15481744 ± 8% cpuidle..usage 0.00 ± 28% +0.0 0.01 ± 10% mpstat.cpu.all.iowait% 0.10 ± 2% -0.0 0.09 ± 4% mpstat.cpu.all.soft% 1443 ± 2% -14.2% 1238 ± 4% uptime.boot 26312 ± 5% -12.8% 22935 ± 5% uptime.idle 8774783 ± 7% -19.0% 7104495 ± 8% turbostat.C1E 10147966 ± 7% -18.4% 8280745 ± 8% turbostat.C6 3.225e+08 ± 2% -14.1% 2.77e+08 ± 4% turbostat.IRQ 2.81 ± 24% +3.5 6.35 ± 24% turbostat.PKG_% 638.24 +2.0% 650.74 turbostat.PkgWatt 57.57 +10.9% 63.85 ± 2% turbostat.RAMWatt 271.39 ± 2% -17.6% 223.53 ± 5% autonuma-benchmark.numa01.seconds 1401 ± 2% -14.6% 1197 ± 4% autonuma-benchmark.time.elapsed_time 1401 ± 2% -14.6% 1197 ± 4% autonuma-benchmark.time.elapsed_time.max 1088153 ± 2% -14.1% 934904 ± 6% autonuma-benchmark.time.involuntary_context_switches 3953 -2.6% 3852 ± 2% autonuma-benchmark.time.system_time 287110 -14.5% 245511 ± 4% autonuma-benchmark.time.user_time 22704 ± 7% +15.9% 26303 ± 8% autonuma-benchmark.time.voluntary_context_switches 191.10 ± 64% +94.9% 372.49 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 4.09 ± 49% +85.6% 7.59 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1.99 ± 40% +99.8% 3.97 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter 14.18 ±158% -82.6% 2.47 ± 22% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 189.39 ± 65% +96.5% 372.20 ± 7% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 2.18 ± 21% -33.3% 1.46 ± 41% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 3.22 ± 32% -73.0% 0.87 ± 81% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.single_open.do_dentry_open 4.73 ± 20% +60.6% 7.59 ± 14% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 9.61 ± 30% -32.8% 6.46 ± 16% perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 13.57 ± 65% -60.2% 5.40 ± 24% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 6040567 -6.2% 5667640 proc-vmstat.numa_hit 32278 ± 7% +51.7% 48955 ± 18% proc-vmstat.numa_huge_pte_updates 4822780 -7.5% 4459553 proc-vmstat.numa_local 3187796 ± 9% +73.2% 5521800 ± 16% proc-vmstat.numa_pages_migrated 16792299 ± 7% +50.8% 25319315 ± 18% proc-vmstat.numa_pte_updates 6242814 -8.5% 5711173 ± 2% proc-vmstat.pgfault 3187796 ± 9% +73.2% 5521800 ± 16% proc-vmstat.pgmigrate_success 254872 ± 2% -12.3% 223591 ± 5% proc-vmstat.pgreuse 6151 ± 9% +74.2% 10717 ± 16% proc-vmstat.thp_migration_success 4201550 -13.7% 3627350 ± 3% proc-vmstat.unevictable_pgs_scanned 1.823e+08 ± 2% -15.2% 1.547e+08 ± 5% sched_debug.cfs_rq:/.avg_vruntime.avg 1.872e+08 ± 2% -15.3% 1.585e+08 ± 5% sched_debug.cfs_rq:/.avg_vruntime.max 1.423e+08 ± 4% -14.0% 1.224e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min 4320209 ± 8% -18.1% 3537344 ± 8% sched_debug.cfs_rq:/.avg_vruntime.stddev 3349 ± 40% +58.3% 5300 ± 27% sched_debug.cfs_rq:/.load_avg.max 1.823e+08 ± 2% -15.2% 1.547e+08 ± 5% sched_debug.cfs_rq:/.min_vruntime.avg 1.872e+08 ± 2% -15.3% 1.585e+08 ± 5% sched_debug.cfs_rq:/.min_vruntime.max 1.423e+08 ± 4% -14.0% 1.224e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.min 4320208 ± 8% -18.1% 3537344 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev 1852009 ± 3% -13.2% 1607461 ± 2% sched_debug.cpu.avg_idle.avg 751880 ± 2% -15.1% 638555 ± 4% sched_debug.cpu.avg_idle.stddev 725827 ± 2% -14.1% 623617 ± 4% sched_debug.cpu.clock.avg 726857 ± 2% -14.1% 624498 ± 4% sched_debug.cpu.clock.max 724740 ± 2% -14.1% 622692 ± 4% sched_debug.cpu.clock.min 717315 ± 2% -14.1% 616349 ± 4% sched_debug.cpu.clock_task.avg 719648 ± 2% -14.1% 618089 ± 4% sched_debug.cpu.clock_task.max 698681 ± 2% -14.2% 599424 ± 4% sched_debug.cpu.clock_task.min 1839 ± 8% -18.1% 1506 ± 7% sched_debug.cpu.clock_task.stddev 27352 -9.6% 24731 ± 2% sched_debug.cpu.curr->pid.max 293258 ± 5% -16.4% 245303 ± 7% sched_debug.cpu.max_idle_balance_cost.stddev -14.96 +73.6% -25.98 sched_debug.cpu.nr_uninterruptible.min 6.27 ± 4% +18.7% 7.44 ± 6% sched_debug.cpu.nr_uninterruptible.stddev 724723 ± 2% -14.1% 622678 ± 4% sched_debug.cpu_clk 723514 ± 2% -14.1% 621468 ± 4% sched_debug.ktime 725604 ± 2% -14.1% 623550 ± 4% sched_debug.sched_clk 29.50 ± 3% +24.9% 36.83 ± 9% perf-stat.i.MPKI 3.592e+08 +5.7% 3.797e+08 ± 2% perf-stat.i.branch-instructions 1823514 +3.7% 1891464 perf-stat.i.branch-misses 28542234 ± 3% +22.0% 34809605 ± 10% perf-stat.i.cache-misses 72486859 ± 3% +19.6% 86713561 ± 7% perf-stat.i.cache-references 224.48 +3.2% 231.63 perf-stat.i.cpu-migrations 145250 ± 2% -10.8% 129549 ± 4% perf-stat.i.cycles-between-cache-misses 0.08 ± 5% -0.0 0.07 ± 10% perf-stat.i.dTLB-load-miss-rate% 272123 ± 6% -15.0% 231302 ± 10% perf-stat.i.dTLB-load-misses 4.515e+08 +4.7% 4.729e+08 ± 2% perf-stat.i.dTLB-loads 995784 +1.9% 1014848 perf-stat.i.dTLB-store-misses 1.844e+08 +1.5% 1.871e+08 perf-stat.i.dTLB-stores 1.711e+09 +5.0% 1.797e+09 ± 2% perf-stat.i.instructions 3.25 +8.3% 3.52 ± 3% perf-stat.i.metric.M/sec 4603 +6.7% 4912 ± 3% perf-stat.i.minor-faults 488266 ± 2% +25.0% 610436 ± 6% perf-stat.i.node-load-misses 618022 ± 4% +13.4% 701130 ± 5% perf-stat.i.node-loads 4603 +6.7% 4912 ± 3% perf-stat.i.page-faults 39.67 ± 2% +16.0% 46.04 ± 6% perf-stat.overall.MPKI 375.84 -4.9% 357.36 ± 2% perf-stat.overall.cpi 24383 ± 3% -19.0% 19742 ± 12% perf-stat.overall.cycles-between-cache-misses 0.06 ± 7% -0.0 0.05 ± 10% perf-stat.overall.dTLB-load-miss-rate% 0.00 +5.2% 0.00 ± 2% perf-stat.overall.ipc 41.99 ± 2% +2.8 44.83 ± 4% perf-stat.overall.node-load-miss-rate% 3.355e+08 +6.3% 3.567e+08 ± 2% perf-stat.ps.branch-instructions 1758832 +4.4% 1835699 perf-stat.ps.branch-misses 24888631 ± 3% +25.6% 31268733 ± 12% perf-stat.ps.cache-misses 64007362 ± 3% +22.5% 78424799 ± 8% perf-stat.ps.cache-references 221.69 +3.0% 228.32 perf-stat.ps.cpu-migrations 4.273e+08 +5.2% 4.495e+08 ± 2% perf-stat.ps.dTLB-loads 992569 +1.8% 1010389 perf-stat.ps.dTLB-store-misses 1.818e+08 +1.6% 1.847e+08 perf-stat.ps.dTLB-stores 1.613e+09 +5.5% 1.701e+09 ± 2% perf-stat.ps.instructions 4331 +7.2% 4644 ± 3% perf-stat.ps.minor-faults 477740 ± 2% +26.3% 603330 ± 7% perf-stat.ps.node-load-misses 660610 ± 5% +12.3% 741896 ± 6% perf-stat.ps.node-loads 4331 +7.2% 4644 ± 3% perf-stat.ps.page-faults 2.264e+12 -10.0% 2.038e+12 ± 3% perf-stat.total.instructions 1.16 ± 20% -0.6 0.59 ± 47% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.07 ± 20% -0.5 0.54 ± 47% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.96 ± 25% -0.7 1.27 ± 23% perf-profile.children.cycles-pp.task_mm_cid_work 1.16 ± 20% -0.5 0.67 ± 19% perf-profile.children.cycles-pp.worker_thread 1.07 ± 20% -0.5 0.61 ± 21% perf-profile.children.cycles-pp.process_one_work 0.84 ± 44% -0.4 0.43 ± 25% perf-profile.children.cycles-pp.evlist__id2evsel 0.58 ± 34% -0.2 0.33 ± 21% perf-profile.children.cycles-pp.do_mprotect_pkey 0.54 ± 26% -0.2 0.30 ± 23% perf-profile.children.cycles-pp.drm_fb_helper_damage_work 0.54 ± 26% -0.2 0.30 ± 23% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty 0.58 ± 34% -0.2 0.34 ± 22% perf-profile.children.cycles-pp.__x64_sys_mprotect 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_vmap_unlocked 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_vmap 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_shmem_object_vmap 0.34 ± 23% -0.2 0.12 ± 64% perf-profile.children.cycles-pp.drm_gem_shmem_vmap_locked 0.55 ± 32% -0.2 0.33 ± 18% perf-profile.children.cycles-pp.__wp_page_copy_user 0.50 ± 35% -0.2 0.28 ± 21% perf-profile.children.cycles-pp.mprotect_fixup 0.28 ± 25% -0.2 0.08 ±101% perf-profile.children.cycles-pp.drm_gem_shmem_get_pages_locked 0.28 ± 25% -0.2 0.08 ±101% perf-profile.children.cycles-pp.drm_gem_get_pages 0.28 ± 25% -0.2 0.08 ±102% perf-profile.children.cycles-pp.shmem_read_folio_gfp 0.28 ± 25% -0.2 0.08 ±102% perf-profile.children.cycles-pp.drm_gem_shmem_get_pages 0.62 ± 15% -0.2 0.43 ± 16% perf-profile.children.cycles-pp.try_to_wake_up 0.25 ± 19% -0.2 0.08 ± 84% perf-profile.children.cycles-pp.drm_client_buffer_vmap 0.44 ± 19% -0.2 0.28 ± 31% perf-profile.children.cycles-pp.filemap_get_entry 0.39 ± 14% -0.1 0.26 ± 22% perf-profile.children.cycles-pp.perf_event_mmap 0.38 ± 13% -0.1 0.25 ± 23% perf-profile.children.cycles-pp.perf_event_mmap_event 0.22 ± 22% -0.1 0.11 ± 25% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.24 ± 21% -0.1 0.14 ± 36% perf-profile.children.cycles-pp.do_open_execat 0.24 ± 13% -0.1 0.14 ± 42% perf-profile.children.cycles-pp.arch_do_signal_or_restart 0.22 ± 30% -0.1 0.13 ± 10% perf-profile.children.cycles-pp.wake_up_q 0.14 ± 17% -0.1 0.05 ±101% perf-profile.children.cycles-pp.open_exec 0.16 ± 21% -0.1 0.07 ± 51% perf-profile.children.cycles-pp.path_init 0.23 ± 30% -0.1 0.15 ± 22% perf-profile.children.cycles-pp.ttwu_do_activate 0.26 ± 11% -0.1 0.18 ± 20% perf-profile.children.cycles-pp.perf_iterate_sb 0.14 ± 50% -0.1 0.07 ± 12% perf-profile.children.cycles-pp.security_inode_getattr 0.18 ± 27% -0.1 0.11 ± 20% perf-profile.children.cycles-pp.select_task_rq 0.14 ± 21% -0.1 0.08 ± 29% perf-profile.children.cycles-pp.get_unmapped_area 0.10 ± 19% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.expand_downwards 0.18 ± 16% -0.1 0.13 ± 26% perf-profile.children.cycles-pp.__d_alloc 0.09 ± 15% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.anon_vma_clone 0.13 ± 36% -0.1 0.08 ± 19% perf-profile.children.cycles-pp.file_free_rcu 0.08 ± 23% -0.0 0.03 ±101% perf-profile.children.cycles-pp.__legitimize_mnt 0.09 ± 15% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__pipe 1.92 ± 26% -0.7 1.24 ± 23% perf-profile.self.cycles-pp.task_mm_cid_work 0.82 ± 43% -0.4 0.42 ± 24% perf-profile.self.cycles-pp.evlist__id2evsel 0.42 ± 39% -0.2 0.22 ± 19% perf-profile.self.cycles-pp.evsel__read_counter 0.27 ± 24% -0.2 0.10 ± 56% perf-profile.self.cycles-pp.filemap_get_entry 0.15 ± 48% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.ksys_read 0.10 ± 34% -0.1 0.03 ±101% perf-profile.self.cycles-pp.enqueue_task_fair 0.13 ± 36% -0.1 0.08 ± 19% perf-profile.self.cycles-pp.file_free_rcu *************************************************************************************************** lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/numa01_THREAD_ALLOC/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 2.309e+10 ± 6% -27.8% 1.668e+10 ± 5% cpuidle..time 23855797 ± 6% -27.9% 17210884 ± 5% cpuidle..usage 2514 -11.9% 2215 uptime.boot 27543 ± 5% -23.1% 21189 ± 5% uptime.idle 9.80 ± 5% -1.8 8.05 ± 6% mpstat.cpu.all.idle% 0.01 ± 6% +0.0 0.01 ± 17% mpstat.cpu.all.iowait% 0.08 -0.0 0.07 ± 2% mpstat.cpu.all.soft% 845597 ± 12% -26.1% 624549 ± 19% numa-numastat.node0.other_node 2990301 ± 6% -13.1% 2598273 ± 4% numa-numastat.node1.local_node 471614 ± 21% +45.0% 684016 ± 18% numa-numastat.node1.other_node 845597 ± 12% -26.1% 624549 ± 19% numa-vmstat.node0.numa_other 4073 ±106% -82.5% 711.67 ± 23% numa-vmstat.node1.nr_mapped 2989568 ± 6% -13.1% 2597798 ± 4% numa-vmstat.node1.numa_local 471614 ± 21% +45.0% 684016 ± 18% numa-vmstat.node1.numa_other 375.07 ± 4% -15.4% 317.31 ± 2% autonuma-benchmark.numa01.seconds 2462 -12.2% 2162 autonuma-benchmark.time.elapsed_time 2462 -12.2% 2162 autonuma-benchmark.time.elapsed_time.max 1354545 -12.9% 1179617 autonuma-benchmark.time.involuntary_context_switches 3212023 -6.5% 3001966 autonuma-benchmark.time.minor_page_faults 8377 +2.3% 8572 autonuma-benchmark.time.percent_of_cpu_this_job_got 199714 -10.4% 179020 autonuma-benchmark.time.user_time 50675 ± 8% -19.0% 41038 ± 12% turbostat.C1 183835 ± 7% -17.6% 151526 ± 6% turbostat.C1E 23556011 ± 6% -28.0% 16965247 ± 5% turbostat.C6 9.72 ± 5% -1.7 7.99 ± 6% turbostat.C6% 9.54 ± 6% -18.1% 7.81 ± 6% turbostat.CPU%c1 2.404e+08 -12.0% 2.116e+08 turbostat.IRQ 280.51 +1.2% 283.99 turbostat.PkgWatt 63.94 +6.7% 68.23 turbostat.RAMWatt 282375 ± 3% -9.8% 254565 ± 7% proc-vmstat.numa_hint_faults 217705 ± 6% -12.6% 190234 ± 8% proc-vmstat.numa_hint_faults_local 7081835 -7.9% 6524239 proc-vmstat.numa_hit 107927 ± 10% +16.6% 125887 proc-vmstat.numa_huge_pte_updates 5764595 -9.5% 5215673 proc-vmstat.numa_local 7379523 ± 15% +25.7% 9272505 ± 4% proc-vmstat.numa_pages_migrated 55530575 ± 10% +16.5% 64669707 proc-vmstat.numa_pte_updates 8852860 -9.3% 8028738 proc-vmstat.pgfault 7379523 ± 15% +25.7% 9272505 ± 4% proc-vmstat.pgmigrate_success 393902 -9.6% 356099 proc-vmstat.pgreuse 14358 ± 15% +25.8% 18064 ± 5% proc-vmstat.thp_migration_success 18273792 -11.5% 16166144 proc-vmstat.unevictable_pgs_scanned 1.45e+08 -8.7% 1.325e+08 sched_debug.cfs_rq:/.avg_vruntime.max 3995873 -14.0% 3437625 ± 2% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.23 ± 3% -8.6% 0.21 ± 6% sched_debug.cfs_rq:/.h_nr_running.stddev 1.45e+08 -8.7% 1.325e+08 sched_debug.cfs_rq:/.min_vruntime.max 3995873 -14.0% 3437625 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev 0.53 ± 71% +195.0% 1.56 ± 37% sched_debug.cfs_rq:/.removed.load_avg.avg 25.54 ± 2% +13.0% 28.87 sched_debug.cfs_rq:/.removed.load_avg.max 3.40 ± 35% +85.6% 6.32 ± 17% sched_debug.cfs_rq:/.removed.load_avg.stddev 0.16 ± 74% +275.6% 0.59 ± 39% sched_debug.cfs_rq:/.removed.runnable_avg.avg 8.03 ± 31% +84.9% 14.84 sched_debug.cfs_rq:/.removed.runnable_avg.max 1.02 ± 44% +154.3% 2.59 ± 16% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 0.16 ± 74% +275.6% 0.59 ± 39% sched_debug.cfs_rq:/.removed.util_avg.avg 8.03 ± 31% +84.9% 14.84 sched_debug.cfs_rq:/.removed.util_avg.max 1.02 ± 44% +154.3% 2.59 ± 16% sched_debug.cfs_rq:/.removed.util_avg.stddev 146.33 ± 4% -12.0% 128.80 ± 8% sched_debug.cfs_rq:/.util_avg.stddev 361281 ± 5% -13.6% 312127 ± 3% sched_debug.cpu.avg_idle.stddev 1229022 -9.9% 1107544 sched_debug.cpu.clock.avg 1229436 -9.9% 1107919 sched_debug.cpu.clock.max 1228579 -9.9% 1107137 sched_debug.cpu.clock.min 248.12 ± 6% -8.9% 226.15 ± 2% sched_debug.cpu.clock.stddev 1201071 -9.7% 1084858 sched_debug.cpu.clock_task.avg 1205361 -9.7% 1088445 sched_debug.cpu.clock_task.max 1190139 -9.7% 1074355 sched_debug.cpu.clock_task.min 156325 ± 4% -21.3% 123055 ± 3% sched_debug.cpu.max_idle_balance_cost.stddev 0.00 ± 5% -8.8% 0.00 ± 2% sched_debug.cpu.next_balance.stddev 0.23 ± 3% -6.9% 0.21 ± 4% sched_debug.cpu.nr_running.stddev 22855 -11.9% 20146 ± 2% sched_debug.cpu.nr_switches.avg 0.00 ± 74% +301.6% 0.00 ± 41% sched_debug.cpu.nr_uninterruptible.avg -20.99 +50.9% -31.67 sched_debug.cpu.nr_uninterruptible.min 1228564 -9.9% 1107124 sched_debug.cpu_clk 1227997 -9.9% 1106556 sched_debug.ktime 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_migratory.avg 0.02 ± 70% +66.1% 0.03 sched_debug.rt_rq:.rt_nr_migratory.max 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_migratory.stddev 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_running.avg 0.02 ± 70% +66.1% 0.03 sched_debug.rt_rq:.rt_nr_running.max 0.00 ± 70% +66.1% 0.00 sched_debug.rt_rq:.rt_nr_running.stddev 1229125 -9.9% 1107673 sched_debug.sched_clk 36.73 +9.2% 40.12 perf-stat.i.MPKI 1.156e+08 +0.9% 1.166e+08 perf-stat.i.branch-instructions 1.41 +0.1 1.49 perf-stat.i.branch-miss-rate% 1755317 +6.4% 1868497 perf-stat.i.branch-misses 65.90 +2.6 68.53 perf-stat.i.cache-miss-rate% 13292768 +13.0% 15016556 perf-stat.i.cache-misses 20180664 +9.2% 22041180 perf-stat.i.cache-references 1620 -2.0% 1588 perf-stat.i.context-switches 492.61 +2.2% 503.60 perf-stat.i.cpi 2.624e+11 +2.3% 2.685e+11 perf-stat.i.cpu-cycles 20261 -9.6% 18315 perf-stat.i.cycles-between-cache-misses 0.08 ± 5% -0.0 0.07 perf-stat.i.dTLB-load-miss-rate% 114641 ± 5% -6.6% 107104 perf-stat.i.dTLB-load-misses 0.24 +0.0 0.25 perf-stat.i.dTLB-store-miss-rate% 202887 +3.4% 209829 perf-stat.i.dTLB-store-misses 479259 ± 2% -9.8% 432243 ± 6% perf-stat.i.iTLB-load-misses 272948 ± 5% -16.4% 228065 ± 3% perf-stat.i.iTLB-loads 5.888e+08 +0.8% 5.938e+08 perf-stat.i.instructions 1349 +15.8% 1561 ± 2% perf-stat.i.instructions-per-iTLB-miss 2.73 +2.3% 2.80 perf-stat.i.metric.GHz 3510 +2.9% 3612 perf-stat.i.minor-faults 302696 ± 4% +8.0% 327055 perf-stat.i.node-load-misses 5025469 ± 3% +16.0% 5831348 ± 2% perf-stat.i.node-store-misses 6419781 +11.7% 7171575 perf-stat.i.node-stores 3510 +2.9% 3613 perf-stat.i.page-faults 34.43 +8.1% 37.21 perf-stat.overall.MPKI 1.51 +0.1 1.59 perf-stat.overall.branch-miss-rate% 66.31 +2.2 68.53 perf-stat.overall.cache-miss-rate% 19793 -9.3% 17950 perf-stat.overall.cycles-between-cache-misses 0.07 ± 5% -0.0 0.07 perf-stat.overall.dTLB-load-miss-rate% 0.23 +0.0 0.24 perf-stat.overall.dTLB-store-miss-rate% 1227 ± 2% +12.1% 1376 ± 6% perf-stat.overall.instructions-per-iTLB-miss 1729818 +6.4% 1840962 perf-stat.ps.branch-misses 13346402 +12.6% 15031113 perf-stat.ps.cache-misses 20127330 +9.0% 21934543 perf-stat.ps.cache-references 1624 -2.1% 1590 perf-stat.ps.context-switches 2.641e+11 +2.1% 2.698e+11 perf-stat.ps.cpu-cycles 113287 ± 5% -6.8% 105635 perf-stat.ps.dTLB-load-misses 203569 +3.2% 210036 perf-stat.ps.dTLB-store-misses 476376 ± 2% -9.8% 429901 ± 6% perf-stat.ps.iTLB-load-misses 259293 ± 5% -16.3% 217088 ± 3% perf-stat.ps.iTLB-loads 3465 +3.1% 3571 perf-stat.ps.minor-faults 299695 ± 4% +8.3% 324433 perf-stat.ps.node-load-misses 5044747 ± 3% +15.7% 5834322 ± 2% perf-stat.ps.node-store-misses 6459846 +11.3% 7189821 perf-stat.ps.node-stores 3465 +3.1% 3571 perf-stat.ps.page-faults 1.44e+12 -11.4% 1.275e+12 perf-stat.total.instructions 0.47 ± 58% +593.5% 3.27 ± 81% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 0.37 ±124% +352.3% 1.67 ± 58% perf-sched.sch_delay.avg.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.96 ± 74% -99.0% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part 2.01 ± 79% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso 1.35 ± 72% -69.8% 0.41 ± 80% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm 0.17 ± 18% -26.5% 0.13 ± 5% perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 0.26 ± 16% -39.0% 0.16 ± 7% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 2.57 ± 65% +1027.2% 28.92 ±120% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 0.38 ±119% +669.3% 2.92 ± 19% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings.isra.0.do_execveat_common 0.51 ±141% +234.9% 1.71 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_map.load_elf_binary 1.63 ± 74% -98.9% 0.02 ±141% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part 3.38 ± 12% -55.7% 1.50 ± 78% perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.search_binary_handler.exec_binprm 2.37 ± 68% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso 2.05 ± 62% -68.1% 0.65 ± 93% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm 9.09 ±119% -96.0% 0.36 ± 42% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 3.86 ± 40% -50.1% 1.93 ± 30% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 2.77 ± 78% -88.0% 0.33 ± 29% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 2.48 ± 60% -86.1% 0.34 ± 7% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 85.92 ± 73% +97.7% 169.86 ± 31% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 95.98 ± 6% -9.5% 86.82 ± 4% perf-sched.total_wait_and_delay.average.ms 95.30 ± 6% -9.6% 86.19 ± 4% perf-sched.total_wait_time.average.ms 725.88 ± 28% -73.5% 192.63 ±141% perf-sched.wait_and_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 2.22 ± 42% -76.2% 0.53 ±141% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 4.02 ± 5% -31.9% 2.74 ± 19% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 653.51 ± 9% -13.3% 566.43 ± 7% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 775.33 ± 4% -19.8% 621.67 ± 13% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi 88.33 ± 14% -16.6% 73.67 ± 11% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 6.28 ± 19% -73.5% 1.67 ±141% perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1286 ± 3% -65.6% 442.66 ± 91% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 222.90 ± 16% +53.8% 342.84 ± 30% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 0.91 ± 70% +7745.7% 71.06 ±129% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter 21.65 ± 34% +42.0% 30.75 ± 12% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork 2.67 ± 26% -96.6% 0.09 ±141% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup 725.14 ± 28% -73.5% 192.24 ±141% perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap 2.87 ± 28% -96.7% 0.09 ± 77% perf-sched.wait_time.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 2.10 ± 73% +4020.9% 86.55 ±135% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat 1.96 ± 73% -94.8% 0.10 ±141% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 3.24 ± 21% -65.0% 1.13 ± 69% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region 338.18 ±140% -100.0% 0.07 ±141% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process 21.80 ±122% -94.7% 1.16 ±130% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.do_vmi_align_munmap 4.29 ± 11% -66.2% 1.45 ±118% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup 0.94 ±126% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write 3.69 ± 29% -72.9% 1.00 ±141% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group 0.04 ±141% +6192.3% 2.73 ± 63% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode 32.86 ±128% -95.2% 1.57 ± 12% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 3.96 ± 5% -33.0% 2.66 ± 19% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 7.38 ± 57% -89.8% 0.75 ± 88% perf-sched.wait_time.avg.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread 643.25 ± 9% -12.8% 560.82 ± 8% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2.22 ± 74% +15121.1% 338.52 ±138% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter 4.97 ± 39% -98.2% 0.09 ±141% perf-sched.wait_time.max.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup 3.98 -96.1% 0.16 ± 94% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open 4.28 ± 3% -66.5% 1.44 ±126% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open 3.95 ± 14% +109.8% 8.28 ± 45% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit 2.04 ± 74% -95.0% 0.10 ±141% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 340.63 ±140% -100.0% 0.12 ±141% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process 4.74 ± 22% -68.4% 1.50 ±117% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup 1.30 ±141% +205.8% 3.99 perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 1.42 ±131% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write 337.62 ±140% -99.6% 1.33 ±141% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru 4.91 ± 27% +4797.8% 240.69 ± 69% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 4.29 ± 7% -76.7% 1.00 ±141% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group 0.05 ±141% +5358.6% 2.77 ± 61% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode 338.90 ±138% -98.8% 3.95 perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 1284 ± 3% -68.7% 401.56 ±106% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 7.38 ± 57% -89.8% 0.75 ± 88% perf-sched.wait_time.max.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread 20.80 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.__cmd_record 20.80 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record 20.78 ± 72% -20.8 0.00 perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record 20.74 ± 72% -20.7 0.00 perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record 20.43 ± 72% -20.4 0.00 perf-profile.calltrace.cycles-pp.process_simple.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record 20.03 ± 72% -20.0 0.00 perf-profile.calltrace.cycles-pp.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events.record__finish_output 19.84 ± 72% -19.8 0.00 perf-profile.calltrace.cycles-pp.queue_event.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events 0.77 ± 26% +0.2 1.00 ± 13% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.73 ± 26% +0.3 1.00 ± 21% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.74 ± 18% +0.3 1.07 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 0.73 ± 18% +0.3 1.07 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.78 ± 36% +0.3 1.11 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.44 ± 73% +0.3 0.77 ± 14% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 0.78 ± 36% +0.3 1.12 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64 0.76 ± 17% +0.3 1.10 ± 19% perf-profile.calltrace.cycles-pp.write 0.81 ± 34% +0.4 1.16 ± 16% perf-profile.calltrace.cycles-pp.__fxstat64 0.96 ± 33% +0.4 1.35 ± 15% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.96 ± 33% +0.4 1.35 ± 15% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.18 ±141% +0.4 0.60 ± 13% perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_openat2 1.00 ± 28% +0.4 1.43 ± 6% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel 0.22 ±141% +0.4 0.65 ± 18% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.47 ± 76% +0.5 0.93 ± 10% perf-profile.calltrace.cycles-pp.mm_init.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64 0.42 ± 73% +0.5 0.90 ± 23% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 1.14 ± 29% +0.5 1.62 ± 7% perf-profile.calltrace.cycles-pp.__close_nocancel 0.41 ± 73% +0.5 0.90 ± 23% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 1.10 ± 28% +0.5 1.59 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close_nocancel 1.10 ± 28% +0.5 1.59 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel 1.13 ± 19% +0.5 1.66 ± 17% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 0.58 ± 77% +0.5 1.12 ± 8% perf-profile.calltrace.cycles-pp.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.22 ±141% +0.5 0.77 ± 18% perf-profile.calltrace.cycles-pp.lookup_fast.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 0.27 ±141% +0.5 0.82 ± 20% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.00 +0.6 0.56 ± 9% perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open 0.22 ±141% +0.6 0.85 ± 18% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 1.03 ± 71% +5.3 6.34 ± 64% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 1.04 ± 71% +5.3 6.37 ± 64% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 1.07 ± 71% +5.4 6.47 ± 63% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 1.00 ± 71% +5.5 6.50 ± 57% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 1.03 ± 71% +5.6 6.61 ± 58% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 1.07 ± 71% +5.7 6.74 ± 57% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 1.38 ± 78% +6.2 7.53 ± 41% perf-profile.calltrace.cycles-pp.copy_page.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch 1.44 ± 80% +6.2 7.63 ± 41% perf-profile.calltrace.cycles-pp.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages 1.44 ± 80% +6.2 7.67 ± 41% perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page 1.44 ± 80% +6.2 7.67 ± 41% perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page 1.52 ± 78% +6.5 8.07 ± 41% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault 1.52 ± 78% +6.5 8.07 ± 41% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault 1.52 ± 78% +6.6 8.08 ± 41% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.53 ± 78% +6.6 8.14 ± 41% perf-profile.calltrace.cycles-pp.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 5.22 ± 49% +7.3 12.52 ± 23% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 5.49 ± 48% +7.5 12.98 ± 22% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 6.00 ± 47% +7.6 13.57 ± 20% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 5.97 ± 48% +7.6 13.55 ± 20% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 6.99 ± 45% +7.8 14.80 ± 19% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 20.83 ± 73% -20.8 0.00 perf-profile.children.cycles-pp.queue_event 20.80 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.record__finish_output 20.78 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.perf_session__process_events 20.75 ± 72% -20.8 0.00 perf-profile.children.cycles-pp.reader__read_event 20.43 ± 72% -20.4 0.00 perf-profile.children.cycles-pp.process_simple 20.03 ± 72% -20.0 0.00 perf-profile.children.cycles-pp.ordered_events__queue 0.37 ± 14% -0.1 0.26 ± 15% perf-profile.children.cycles-pp.rebalance_domains 0.11 ± 8% -0.1 0.06 ± 75% perf-profile.children.cycles-pp.wake_up_q 0.13 ± 7% +0.0 0.15 ± 13% perf-profile.children.cycles-pp.get_unmapped_area 0.05 +0.0 0.08 ± 22% perf-profile.children.cycles-pp.complete_signal 0.07 ± 23% +0.0 0.10 ± 19% perf-profile.children.cycles-pp.lru_add_fn 0.08 ± 24% +0.0 0.12 ± 10% perf-profile.children.cycles-pp.__do_sys_brk 0.08 ± 11% +0.0 0.13 ± 19% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown 0.08 ± 12% +0.0 0.12 ± 27% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 0.02 ±141% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.workingset_age_nonresident 0.02 ±141% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.workingset_activation 0.04 ± 71% +0.1 0.09 ± 5% perf-profile.children.cycles-pp.page_add_file_rmap 0.09 ± 18% +0.1 0.14 ± 23% perf-profile.children.cycles-pp.terminate_walk 0.08 ± 12% +0.1 0.13 ± 19% perf-profile.children.cycles-pp.__send_signal_locked 0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.proc_pident_lookup 0.11 ± 15% +0.1 0.17 ± 15% perf-profile.children.cycles-pp.exit_notify 0.15 ± 31% +0.1 0.21 ± 15% perf-profile.children.cycles-pp.try_charge_memcg 0.04 ± 71% +0.1 0.10 ± 27% perf-profile.children.cycles-pp.__mod_lruvec_state 0.04 ± 73% +0.1 0.10 ± 24% perf-profile.children.cycles-pp.__mod_node_page_state 0.11 ± 25% +0.1 0.17 ± 22% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler 0.08 ± 12% +0.1 0.14 ± 26% perf-profile.children.cycles-pp.get_slabinfo 0.02 ±141% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.fput 0.12 ± 6% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.xas_find 0.08 ± 17% +0.1 0.15 ± 39% perf-profile.children.cycles-pp.task_numa_fault 0.07 ± 44% +0.1 0.14 ± 18% perf-profile.children.cycles-pp.___slab_alloc 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.copy_creds 0.08 ± 12% +0.1 0.15 ± 18% perf-profile.children.cycles-pp._exit 0.07 ± 78% +0.1 0.15 ± 27% perf-profile.children.cycles-pp.file_free_rcu 0.02 ±141% +0.1 0.09 ± 25% perf-profile.children.cycles-pp.do_task_dead 0.19 ± 22% +0.1 0.27 ± 10% perf-profile.children.cycles-pp.dequeue_entity 0.18 ± 29% +0.1 0.26 ± 16% perf-profile.children.cycles-pp.lru_add_drain 0.03 ± 70% +0.1 0.11 ± 25% perf-profile.children.cycles-pp.node_read_numastat 0.07 ± 25% +0.1 0.15 ± 51% perf-profile.children.cycles-pp.__kernel_read 0.20 ± 4% +0.1 0.28 ± 24% perf-profile.children.cycles-pp.__do_fault 0.23 ± 17% +0.1 0.31 ± 9% perf-profile.children.cycles-pp.native_irq_return_iret 0.11 ± 27% +0.1 0.20 ± 17% perf-profile.children.cycles-pp.__pte_alloc 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.cgroup_rstat_flush 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.cgroup_rstat_flush_locked 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.do_flush_stats 0.06 ± 86% +0.1 0.14 ± 44% perf-profile.children.cycles-pp.flush_memcg_stats_dwork 0.12 ± 28% +0.1 0.20 ± 18% perf-profile.children.cycles-pp.d_path 0.08 ± 36% +0.1 0.16 ± 17% perf-profile.children.cycles-pp.lookup_open 0.11 ± 7% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.copy_pte_range 0.13 ± 16% +0.1 0.22 ± 18% perf-profile.children.cycles-pp.dev_attr_show 0.04 ± 73% +0.1 0.13 ± 49% perf-profile.children.cycles-pp.task_numa_migrate 0.19 ± 17% +0.1 0.28 ± 7% perf-profile.children.cycles-pp.__count_memcg_events 0.15 ± 17% +0.1 0.24 ± 10% perf-profile.children.cycles-pp.__pmd_alloc 0.00 +0.1 0.09 ± 31% perf-profile.children.cycles-pp.remove_vma 0.13 ± 16% +0.1 0.22 ± 22% perf-profile.children.cycles-pp.sysfs_kf_seq_show 0.12 ± 26% +0.1 0.21 ± 26% perf-profile.children.cycles-pp.__do_set_cpus_allowed 0.08 ± 78% +0.1 0.18 ± 20% perf-profile.children.cycles-pp.free_unref_page 0.02 ±141% +0.1 0.11 ± 32% perf-profile.children.cycles-pp.nd_jump_root 0.05 ± 74% +0.1 0.14 ± 23% perf-profile.children.cycles-pp._find_next_bit 0.12 ± 22% +0.1 0.21 ± 21% perf-profile.children.cycles-pp.clock_gettime 0.02 ±141% +0.1 0.11 ± 29% perf-profile.children.cycles-pp.free_percpu 0.00 +0.1 0.10 ± 25% perf-profile.children.cycles-pp.lockref_get 0.25 ± 40% +0.1 0.35 ± 24% perf-profile.children.cycles-pp.shift_arg_pages 0.26 ± 29% +0.1 0.36 ± 14% perf-profile.children.cycles-pp.rmqueue 0.13 ± 35% +0.1 0.23 ± 24% perf-profile.children.cycles-pp.single_open 0.05 ± 78% +0.1 0.15 ± 29% perf-profile.children.cycles-pp.vma_expand 0.09 ± 5% +0.1 0.21 ± 41% perf-profile.children.cycles-pp.prepare_task_switch 0.08 ± 12% +0.1 0.19 ± 37% perf-profile.children.cycles-pp.copy_page_to_iter 0.22 ± 40% +0.1 0.34 ± 33% perf-profile.children.cycles-pp.mas_wr_node_store 0.16 ± 41% +0.1 0.27 ± 13% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr_locked 0.16 ± 10% +0.1 0.28 ± 26% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.11 ± 28% +0.1 0.23 ± 27% perf-profile.children.cycles-pp.single_release 0.00 +0.1 0.12 ± 37% perf-profile.children.cycles-pp.find_busiest_queue 0.23 ± 28% +0.1 0.35 ± 23% perf-profile.children.cycles-pp.pte_alloc_one 0.23 ± 32% +0.1 0.35 ± 16% perf-profile.children.cycles-pp.strncpy_from_user 0.20 ± 55% +0.1 0.33 ± 25% perf-profile.children.cycles-pp.gather_stats 0.16 ± 30% +0.1 0.30 ± 12% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.29 ± 31% +0.1 0.43 ± 14% perf-profile.children.cycles-pp.setup_arg_pages 0.13 ± 18% +0.1 0.27 ± 28% perf-profile.children.cycles-pp.aa_file_perm 0.03 ± 70% +0.1 0.18 ± 73% perf-profile.children.cycles-pp.set_pmd_migration_entry 0.09 ±103% +0.1 0.23 ± 39% perf-profile.children.cycles-pp.__wait_for_common 0.19 ± 16% +0.1 0.33 ± 27% perf-profile.children.cycles-pp.obj_cgroup_charge 0.03 ± 70% +0.1 0.18 ± 74% perf-profile.children.cycles-pp.try_to_migrate_one 0.14 ± 41% +0.2 0.29 ± 34% perf-profile.children.cycles-pp.select_task_rq 0.28 ± 35% +0.2 0.44 ± 28% perf-profile.children.cycles-pp.vm_area_alloc 0.04 ± 71% +0.2 0.20 ± 73% perf-profile.children.cycles-pp.try_to_migrate 0.04 ± 71% +0.2 0.22 ± 70% perf-profile.children.cycles-pp.rmap_walk_anon 0.37 ± 28% +0.2 0.55 ± 23% perf-profile.children.cycles-pp.pick_next_task_fair 0.04 ± 71% +0.2 0.22 ± 57% perf-profile.children.cycles-pp.migrate_folio_unmap 0.11 ± 51% +0.2 0.31 ± 30% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 0.30 ± 30% +0.2 0.50 ± 16% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.30 ± 19% +0.2 0.50 ± 23% perf-profile.children.cycles-pp.__perf_sw_event 0.21 ± 30% +0.2 0.41 ± 19% perf-profile.children.cycles-pp.apparmor_file_permission 0.25 ± 29% +0.2 0.45 ± 15% perf-profile.children.cycles-pp.security_file_permission 0.13 ± 55% +0.2 0.34 ± 24% perf-profile.children.cycles-pp.smp_call_function_many_cond 0.31 ± 34% +0.2 0.52 ± 30% perf-profile.children.cycles-pp.pipe_read 0.32 ± 16% +0.2 0.55 ± 8% perf-profile.children.cycles-pp.getname_flags 0.33 ± 11% +0.2 0.55 ± 21% perf-profile.children.cycles-pp.___perf_sw_event 0.17 ± 44% +0.2 0.40 ± 38% perf-profile.children.cycles-pp.newidle_balance 0.38 ± 38% +0.2 0.60 ± 12% perf-profile.children.cycles-pp.__percpu_counter_init 0.38 ± 37% +0.2 0.61 ± 18% perf-profile.children.cycles-pp.readlink 0.27 ± 40% +0.2 0.51 ± 21% perf-profile.children.cycles-pp.mod_objcg_state 0.76 ± 17% +0.3 1.10 ± 19% perf-profile.children.cycles-pp.write 0.48 ± 42% +0.4 0.83 ± 13% perf-profile.children.cycles-pp.pid_revalidate 0.61 ± 34% +0.4 0.98 ± 17% perf-profile.children.cycles-pp.__d_lookup_rcu 0.73 ± 35% +0.4 1.12 ± 8% perf-profile.children.cycles-pp.alloc_bprm 0.59 ± 42% +0.4 0.98 ± 11% perf-profile.children.cycles-pp.pcpu_alloc 0.77 ± 31% +0.4 1.21 ± 4% perf-profile.children.cycles-pp.mm_init 0.92 ± 31% +0.5 1.38 ± 12% perf-profile.children.cycles-pp.__fxstat64 0.74 ± 32% +0.5 1.27 ± 20% perf-profile.children.cycles-pp.open_last_lookups 1.37 ± 29% +0.6 1.94 ± 19% perf-profile.children.cycles-pp.kmem_cache_alloc 1.35 ± 38% +0.7 2.09 ± 15% perf-profile.children.cycles-pp.lookup_fast 1.13 ± 59% +5.3 6.47 ± 63% perf-profile.children.cycles-pp.start_secondary 1.06 ± 60% +5.4 6.50 ± 57% perf-profile.children.cycles-pp.intel_idle 1.09 ± 59% +5.5 6.62 ± 58% perf-profile.children.cycles-pp.cpuidle_enter 1.09 ± 59% +5.5 6.62 ± 58% perf-profile.children.cycles-pp.cpuidle_enter_state 1.10 ± 59% +5.5 6.65 ± 58% perf-profile.children.cycles-pp.cpuidle_idle_call 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.cpu_startup_entry 1.13 ± 59% +5.6 6.74 ± 57% perf-profile.children.cycles-pp.do_idle 1.51 ± 69% +6.1 7.65 ± 41% perf-profile.children.cycles-pp.folio_copy 1.52 ± 69% +6.2 7.68 ± 41% perf-profile.children.cycles-pp.move_to_new_folio 1.52 ± 69% +6.2 7.68 ± 41% perf-profile.children.cycles-pp.migrate_folio_extra 1.74 ± 63% +6.2 7.96 ± 39% perf-profile.children.cycles-pp.copy_page 1.61 ± 68% +6.5 8.08 ± 41% perf-profile.children.cycles-pp.migrate_pages_batch 1.61 ± 68% +6.5 8.09 ± 41% perf-profile.children.cycles-pp.migrate_pages 1.61 ± 68% +6.5 8.10 ± 41% perf-profile.children.cycles-pp.migrate_misplaced_page 1.62 ± 67% +6.5 8.14 ± 41% perf-profile.children.cycles-pp.do_huge_pmd_numa_page 7.23 ± 41% +7.5 14.76 ± 19% perf-profile.children.cycles-pp.__handle_mm_fault 8.24 ± 38% +7.6 15.86 ± 17% perf-profile.children.cycles-pp.exc_page_fault 8.20 ± 38% +7.6 15.84 ± 17% perf-profile.children.cycles-pp.do_user_addr_fault 9.84 ± 35% +7.7 17.51 ± 15% perf-profile.children.cycles-pp.asm_exc_page_fault 7.71 ± 40% +7.7 15.41 ± 18% perf-profile.children.cycles-pp.handle_mm_fault 20.00 ± 72% -20.0 0.00 perf-profile.self.cycles-pp.queue_event 0.18 ± 22% -0.1 0.10 ± 24% perf-profile.self.cycles-pp.__d_lookup 0.07 ± 25% +0.0 0.10 ± 9% perf-profile.self.cycles-pp.__perf_read_group_add 0.08 ± 16% +0.0 0.12 ± 26% perf-profile.self.cycles-pp.check_heap_object 0.05 ± 8% +0.0 0.09 ± 30% perf-profile.self.cycles-pp.__memcg_kmem_charge_page 0.02 ±141% +0.0 0.06 ± 13% perf-profile.self.cycles-pp.try_to_wake_up 0.08 ± 31% +0.1 0.14 ± 30% perf-profile.self.cycles-pp.task_dump_owner 0.05 ± 74% +0.1 0.10 ± 24% perf-profile.self.cycles-pp.rmqueue 0.14 ± 26% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.init_file 0.05 ± 78% +0.1 0.10 ± 4% perf-profile.self.cycles-pp.enqueue_task_fair 0.05 ± 78% +0.1 0.10 ± 27% perf-profile.self.cycles-pp.___slab_alloc 0.02 ±141% +0.1 0.08 ± 24% perf-profile.self.cycles-pp.pick_link 0.04 ± 73% +0.1 0.10 ± 24% perf-profile.self.cycles-pp.__mod_node_page_state 0.07 ± 17% +0.1 0.14 ± 26% perf-profile.self.cycles-pp.get_slabinfo 0.00 +0.1 0.07 ± 18% perf-profile.self.cycles-pp.select_task_rq 0.07 ± 78% +0.1 0.15 ± 27% perf-profile.self.cycles-pp.file_free_rcu 0.09 ± 44% +0.1 0.16 ± 15% perf-profile.self.cycles-pp.apparmor_file_permission 0.08 ± 27% +0.1 0.15 ± 35% perf-profile.self.cycles-pp.malloc 0.02 ±141% +0.1 0.10 ± 29% perf-profile.self.cycles-pp.memcg_account_kmem 0.23 ± 17% +0.1 0.31 ± 9% perf-profile.self.cycles-pp.native_irq_return_iret 0.13 ± 32% +0.1 0.21 ± 32% perf-profile.self.cycles-pp.obj_cgroup_charge 0.10 ± 43% +0.1 0.19 ± 11% perf-profile.self.cycles-pp.perf_read 0.14 ± 12% +0.1 0.23 ± 25% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.13 ± 43% +0.1 0.23 ± 27% perf-profile.self.cycles-pp.mod_objcg_state 0.00 +0.1 0.10 ± 25% perf-profile.self.cycles-pp.lockref_get 0.07 ± 78% +0.1 0.18 ± 34% perf-profile.self.cycles-pp.update_rq_clock_task 0.00 +0.1 0.10 ± 27% perf-profile.self.cycles-pp.find_busiest_queue 0.09 ± 59% +0.1 0.21 ± 29% perf-profile.self.cycles-pp.smp_call_function_many_cond 0.15 ± 31% +0.1 0.27 ± 16% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.19 ± 39% +0.1 0.32 ± 19% perf-profile.self.cycles-pp.zap_pte_range 0.13 ± 18% +0.1 0.26 ± 23% perf-profile.self.cycles-pp.aa_file_perm 0.19 ± 50% +0.1 0.32 ± 24% perf-profile.self.cycles-pp.gather_stats 0.24 ± 16% +0.2 0.40 ± 17% perf-profile.self.cycles-pp.___perf_sw_event 0.25 ± 31% +0.2 0.41 ± 16% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.08 ± 71% +0.2 0.25 ± 24% perf-profile.self.cycles-pp.pcpu_alloc 0.16 ± 38% +0.2 0.34 ± 21% perf-profile.self.cycles-pp.filemap_map_pages 0.32 ± 41% +0.2 0.54 ± 17% perf-profile.self.cycles-pp.pid_revalidate 0.47 ± 19% +0.3 0.73 ± 21% perf-profile.self.cycles-pp.kmem_cache_alloc 0.60 ± 34% +0.4 0.96 ± 18% perf-profile.self.cycles-pp.__d_lookup_rcu 1.06 ± 60% +5.4 6.50 ± 57% perf-profile.self.cycles-pp.intel_idle 1.74 ± 63% +6.2 7.92 ± 39% perf-profile.self.cycles-pp.copy_page *************************************************************************************************** lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/_INVERSE_BIND/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 0.01 ± 20% +0.0 0.01 ± 15% mpstat.cpu.all.iowait% 25370 ± 3% -13.5% 21946 ± 6% uptime.idle 2.098e+10 ± 4% -15.8% 1.767e+10 ± 7% cpuidle..time 21696014 ± 4% -15.8% 18274389 ± 7% cpuidle..usage 3567832 ± 2% -12.9% 3106532 ± 5% numa-numastat.node1.local_node 4472555 ± 2% -10.8% 3989658 ± 6% numa-numastat.node1.numa_hit 21420616 ± 4% -15.9% 18019892 ± 7% turbostat.C6 62.12 +3.8% 64.46 turbostat.RAMWatt 185236 ± 6% -17.4% 152981 ± 15% numa-meminfo.node1.Active 184892 ± 6% -17.5% 152523 ± 15% numa-meminfo.node1.Active(anon) 190876 ± 6% -17.4% 157580 ± 15% numa-meminfo.node1.Shmem 373.94 ± 4% -14.8% 318.67 ± 6% autonuma-benchmark.numa01.seconds 3066 ± 2% -7.6% 2833 ± 3% autonuma-benchmark.time.elapsed_time 3066 ± 2% -7.6% 2833 ± 3% autonuma-benchmark.time.elapsed_time.max 1770652 ± 3% -7.7% 1634112 ± 3% autonuma-benchmark.time.involuntary_context_switches 258701 ± 2% -6.9% 240826 ± 3% autonuma-benchmark.time.user_time 46235 ± 6% -17.5% 38150 ± 15% numa-vmstat.node1.nr_active_anon 47723 ± 6% -17.4% 39411 ± 15% numa-vmstat.node1.nr_shmem 46235 ± 6% -17.5% 38150 ± 15% numa-vmstat.node1.nr_zone_active_anon 4471422 ± 2% -10.8% 3989129 ± 6% numa-vmstat.node1.numa_hit 3566699 ± 2% -12.9% 3106004 ± 5% numa-vmstat.node1.numa_local 2.37 ± 23% +45.3% 3.44 ± 16% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 2.26 ± 28% +45.0% 3.28 ± 20% sched_debug.cfs_rq:/.removed.util_avg.stddev 203.53 ± 4% -12.8% 177.48 ± 3% sched_debug.cfs_rq:/.util_est_enqueued.stddev 128836 ± 7% -16.9% 107001 ± 8% sched_debug.cpu.max_idle_balance_cost.stddev 12639 ± 6% -12.1% 11108 ± 8% sched_debug.cpu.nr_switches.min 0.06 ± 41% -44.9% 0.04 ± 20% perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 1.84 ± 23% -56.4% 0.80 ± 33% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.08 ± 38% -55.2% 0.04 ± 22% perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 7.55 ± 60% -77.2% 1.72 ±152% perf-sched.wait_time.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm 10.72 ± 60% -73.8% 2.81 ±171% perf-sched.wait_time.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm 0.28 ± 12% -16.4% 0.23 ± 5% perf-sched.wait_time.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 8802 ± 3% -4.3% 8427 proc-vmstat.nr_mapped 54506 ± 5% -5.2% 51656 proc-vmstat.nr_shmem 8510048 -4.5% 8124296 proc-vmstat.numa_hit 43091 ± 8% +15.9% 49938 ± 6% proc-vmstat.numa_huge_pte_updates 7242046 -5.3% 6860532 ± 2% proc-vmstat.numa_local 3762770 ± 5% +34.7% 5068087 ± 3% proc-vmstat.numa_pages_migrated 22235827 ± 8% +15.8% 25759214 ± 6% proc-vmstat.numa_pte_updates 10591821 -5.4% 10024519 ± 2% proc-vmstat.pgfault 3762770 ± 5% +34.7% 5068087 ± 3% proc-vmstat.pgmigrate_success 489883 ± 2% -6.8% 456801 ± 3% proc-vmstat.pgreuse 7297 ± 5% +34.8% 9838 ± 3% proc-vmstat.thp_migration_success 22825216 -7.4% 21132800 ± 3% proc-vmstat.unevictable_pgs_scanned 40.10 +4.2% 41.80 perf-stat.i.MPKI 1.64 +0.1 1.74 perf-stat.i.branch-miss-rate% 1920111 +6.9% 2051982 perf-stat.i.branch-misses 60.50 +1.2 61.72 perf-stat.i.cache-miss-rate% 12369678 +6.9% 13223477 perf-stat.i.cache-misses 21918348 +4.6% 22934958 perf-stat.i.cache-references 22544 -4.0% 21634 perf-stat.i.cycles-between-cache-misses 1458 +12.1% 1635 ± 5% perf-stat.i.instructions-per-iTLB-miss 2.51 +2.4% 2.57 perf-stat.i.metric.M/sec 3383 +2.3% 3460 perf-stat.i.minor-faults 244016 +5.0% 256219 perf-stat.i.node-load-misses 4544736 +9.5% 4977101 ± 3% perf-stat.i.node-store-misses 6126744 +5.5% 6463826 ± 2% perf-stat.i.node-stores 3383 +2.3% 3460 perf-stat.i.page-faults 37.34 +3.4% 38.60 perf-stat.overall.MPKI 1.64 +0.1 1.74 perf-stat.overall.branch-miss-rate% 21951 -5.4% 20763 perf-stat.overall.cycles-between-cache-misses 1866870 +7.1% 2000069 perf-stat.ps.branch-misses 12385090 +6.6% 13198317 perf-stat.ps.cache-misses 21609219 +4.6% 22595642 perf-stat.ps.cache-references 3340 +2.3% 3418 perf-stat.ps.minor-faults 243774 +4.9% 255759 perf-stat.ps.node-load-misses 4560352 +9.0% 4973035 ± 3% perf-stat.ps.node-store-misses 6135666 +5.2% 6452858 ± 2% perf-stat.ps.node-stores 3340 +2.3% 3418 perf-stat.ps.page-faults 1.775e+12 -6.5% 1.659e+12 ± 2% perf-stat.total.instructions 32.90 ± 14% -14.9 17.99 ± 40% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt 0.60 ± 14% +0.3 0.88 ± 23% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 0.57 ± 49% +0.4 0.93 ± 14% perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec 0.78 ± 12% +0.4 1.15 ± 34% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read 0.80 ± 14% +0.4 1.17 ± 26% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.82 ± 15% +0.4 1.19 ± 33% perf-profile.calltrace.cycles-pp.__libc_read.readn.perf_evsel__read.read_counters.process_interval 0.80 ± 14% +0.4 1.19 ± 33% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read.read_counters 0.50 ± 46% +0.4 0.89 ± 25% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.59 ± 49% +0.4 0.98 ± 19% perf-profile.calltrace.cycles-pp.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec.bprm_execve 0.59 ± 48% +0.4 1.00 ± 25% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.67 ± 47% +0.4 1.10 ± 22% perf-profile.calltrace.cycles-pp.sched_exec.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64 0.90 ± 18% +0.4 1.33 ± 24% perf-profile.calltrace.cycles-pp.show_numa_map.seq_read_iter.seq_read.vfs_read.ksys_read 0.66 ± 46% +0.4 1.09 ± 27% perf-profile.calltrace.cycles-pp.gather_pte_stats.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range 0.68 ± 46% +0.5 1.13 ± 27% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map 0.68 ± 46% +0.5 1.13 ± 27% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.walk_page_vma.show_numa_map.seq_read_iter.seq_read.vfs_read 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter.seq_read 0.68 ± 46% +0.5 1.14 ± 27% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter 0.40 ± 71% +0.5 0.88 ± 20% perf-profile.calltrace.cycles-pp._dl_addr 0.93 ± 18% +0.5 1.45 ± 28% perf-profile.calltrace.cycles-pp.__fxstat64 0.88 ± 18% +0.5 1.41 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64 0.88 ± 18% +0.5 1.42 ± 28% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64 0.60 ± 73% +0.6 1.24 ± 18% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.23 ±142% +0.7 0.88 ± 26% perf-profile.calltrace.cycles-pp.show_stat.seq_read_iter.vfs_read.ksys_read.do_syscall_64 2.87 ± 14% +1.3 4.21 ± 23% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 2.88 ± 14% +1.4 4.23 ± 23% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 34.28 ± 13% -14.6 19.70 ± 36% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.13 ± 29% -0.1 0.05 ± 76% perf-profile.children.cycles-pp.schedule_tail 0.12 ± 20% -0.1 0.05 ± 78% perf-profile.children.cycles-pp.__put_user_4 0.18 ± 16% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.__x64_sys_munmap 0.09 ± 17% +0.1 0.16 ± 27% perf-profile.children.cycles-pp.__do_sys_brk 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_insert_into_field 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_opcode_1A_1T_1R 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_store_object_to_node 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.acpi_ex_write_data_to_field 0.02 ±142% +0.1 0.09 ± 50% perf-profile.children.cycles-pp.common_perm_cond 0.06 ± 58% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.___slab_alloc 0.02 ±144% +0.1 0.10 ± 63% perf-profile.children.cycles-pp.__alloc_pages_bulk 0.06 ± 18% +0.1 0.14 ± 58% perf-profile.children.cycles-pp.security_inode_getattr 0.12 ± 40% +0.1 0.21 ± 28% perf-profile.children.cycles-pp.__ptrace_may_access 0.07 ± 33% +0.1 0.18 ± 40% perf-profile.children.cycles-pp.brk 0.15 ± 14% +0.1 0.26 ± 23% perf-profile.children.cycles-pp.wq_worker_comm 0.09 ± 87% +0.1 0.21 ± 30% perf-profile.children.cycles-pp.irq_get_next_irq 0.93 ± 12% +0.2 1.17 ± 19% perf-profile.children.cycles-pp.do_dentry_open 0.15 ± 30% +0.3 0.43 ± 56% perf-profile.children.cycles-pp.run_ksoftirqd 0.54 ± 17% +0.4 0.89 ± 20% perf-profile.children.cycles-pp._dl_addr 0.74 ± 19% +0.4 1.09 ± 27% perf-profile.children.cycles-pp.gather_pte_stats 0.74 ± 25% +0.4 1.10 ± 21% perf-profile.children.cycles-pp.sched_exec 0.76 ± 19% +0.4 1.13 ± 27% perf-profile.children.cycles-pp.walk_p4d_range 0.76 ± 19% +0.4 1.13 ± 27% perf-profile.children.cycles-pp.walk_pud_range 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.walk_page_vma 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.__walk_page_range 0.76 ± 19% +0.4 1.14 ± 27% perf-profile.children.cycles-pp.walk_pgd_range 0.92 ± 13% +0.4 1.33 ± 20% perf-profile.children.cycles-pp.open_last_lookups 0.90 ± 17% +0.4 1.33 ± 24% perf-profile.children.cycles-pp.show_numa_map 0.43 ± 51% +0.5 0.88 ± 26% perf-profile.children.cycles-pp.show_stat 1.49 ± 11% +0.5 1.94 ± 15% perf-profile.children.cycles-pp.__do_softirq 1.22 ± 18% +0.6 1.78 ± 16% perf-profile.children.cycles-pp.update_sg_wakeup_stats 1.28 ± 20% +0.6 1.88 ± 18% perf-profile.children.cycles-pp.find_idlest_group 1.07 ± 16% +0.6 1.67 ± 30% perf-profile.children.cycles-pp.__fxstat64 1.36 ± 20% +0.6 1.98 ± 21% perf-profile.children.cycles-pp.find_idlest_cpu 30.64 ± 15% -14.9 15.70 ± 46% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt 0.01 ±223% +0.1 0.07 ± 36% perf-profile.self.cycles-pp.pick_next_task_fair 0.10 ± 28% +0.1 0.17 ± 28% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg 0.00 +0.1 0.07 ± 32% perf-profile.self.cycles-pp.touch_atime 0.04 ±106% +0.1 0.11 ± 18% perf-profile.self.cycles-pp.___slab_alloc 0.12 ± 37% +0.1 0.20 ± 27% perf-profile.self.cycles-pp.__ptrace_may_access 0.05 ± 52% +0.1 0.13 ± 75% perf-profile.self.cycles-pp.pick_link 0.14 ± 28% +0.1 0.24 ± 34% perf-profile.self.cycles-pp.__slab_free 0.47 ± 19% +0.3 0.79 ± 16% perf-profile.self.cycles-pp._dl_addr 1.00 ± 19% +0.4 1.44 ± 18% perf-profile.self.cycles-pp.update_sg_wakeup_stats 6.04 ± 14% +1.9 7.99 ± 18% perf-profile.self.cycles-pp.syscall_exit_to_user_mode *************************************************************************************************** lkp-icl-2sp6: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark commit: fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq") 167773d1dd ("sched/numa: Increase tasks' access history") fc769221b23064c0 167773d1ddb5ffdd944f851f2cb ---------------- --------------------------- %stddev %change %stddev \ | \ 36796 ± 6% -19.0% 29811 ± 8% uptime.idle 3.231e+10 ± 7% -21.6% 2.534e+10 ± 10% cpuidle..time 33785162 ± 7% -21.8% 26431366 ± 10% cpuidle..usage 10.56 ± 7% -1.5 9.02 ± 9% mpstat.cpu.all.idle% 0.01 ± 22% +0.0 0.01 ± 11% mpstat.cpu.all.iowait% 0.17 ± 2% -0.0 0.15 ± 4% mpstat.cpu.all.soft% 388157 ± 31% +60.9% 624661 ± 36% numa-numastat.node0.other_node 4511165 ± 4% -13.5% 3901276 ± 7% numa-numastat.node1.numa_hit 951382 ± 12% -30.4% 661932 ± 31% numa-numastat.node1.other_node 388157 ± 31% +60.9% 624658 ± 36% numa-vmstat.node0.numa_other 4510646 ± 4% -13.5% 3900948 ± 7% numa-vmstat.node1.numa_hit 951382 ± 12% -30.4% 661932 ± 31% numa-vmstat.node1.numa_other 305.08 ± 5% +19.6% 364.96 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.avg 989.11 ± 4% +13.0% 1117 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.max 5082 ± 6% -19.0% 4114 ± 12% sched_debug.cpu.curr->pid.stddev 85229 -13.2% 74019 ± 9% sched_debug.cpu.max_idle_balance_cost.stddev 7575 ± 5% -8.3% 6946 ± 3% sched_debug.cpu.nr_switches.min 394498 ± 5% -21.0% 311653 ± 10% turbostat.C1E 33233046 ± 8% -21.7% 26018024 ± 10% turbostat.C6 10.39 ± 7% -1.5 8.90 ± 9% turbostat.C6% 7.77 ± 6% -17.5% 6.41 ± 9% turbostat.CPU%c1 206.88 +2.9% 212.86 turbostat.RAMWatt 372.30 -8.3% 341.49 autonuma-benchmark.numa01.seconds 209.06 -10.7% 186.67 ± 6% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 2408 -8.6% 2200 ± 2% autonuma-benchmark.time.elapsed_time 2408 -8.6% 2200 ± 2% autonuma-benchmark.time.elapsed_time.max 1221333 ± 2% -5.1% 1159380 ± 2% autonuma-benchmark.time.involuntary_context_switches 3508627 -4.1% 3363550 autonuma-benchmark.time.minor_page_faults 11174 +1.9% 11388 autonuma-benchmark.time.percent_of_cpu_this_job_got 261419 -7.0% 243046 ± 2% autonuma-benchmark.time.user_time 220972 ± 7% +22.1% 269753 ± 3% proc-vmstat.numa_hint_faults 164886 ± 11% +19.4% 196883 ± 5% proc-vmstat.numa_hint_faults_local 7964964 -5.9% 7494239 proc-vmstat.numa_hit 82885 ± 6% +43.4% 118829 ± 6% proc-vmstat.numa_huge_pte_updates 6625289 -6.3% 6207618 proc-vmstat.numa_local 6636312 ± 4% +33.1% 8834573 ± 3% proc-vmstat.numa_pages_migrated 42671823 ± 6% +43.2% 61094857 ± 6% proc-vmstat.numa_pte_updates 9173569 -6.2% 8602789 proc-vmstat.pgfault 6636312 ± 4% +33.1% 8834573 ± 3% proc-vmstat.pgmigrate_success 397595 -6.5% 371818 proc-vmstat.pgreuse 12917 ± 4% +33.2% 17200 ± 3% proc-vmstat.thp_migration_success 17964288 -8.7% 16401792 ± 2% proc-vmstat.unevictable_pgs_scanned 0.63 ± 12% -0.3 0.28 ±100% perf-profile.calltrace.cycles-pp.__libc_read.readn.evsel__read_counter.read_counters.process_interval 1.17 ± 4% -0.2 0.96 ± 14% perf-profile.children.cycles-pp.__irq_exit_rcu 0.65 ± 19% -0.2 0.46 ± 13% perf-profile.children.cycles-pp.task_mm_cid_work 0.23 ± 16% -0.2 0.08 ± 61% perf-profile.children.cycles-pp.rcu_gp_kthread 0.30 ± 5% -0.1 0.16 ± 43% perf-profile.children.cycles-pp.rebalance_domains 0.13 ± 21% -0.1 0.03 ±100% perf-profile.children.cycles-pp.rcu_gp_fqs_loop 0.25 ± 16% -0.1 0.18 ± 14% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.17 ± 9% -0.1 0.11 ± 23% perf-profile.children.cycles-pp.__perf_read_group_add 0.09 ± 21% -0.0 0.04 ± 72% perf-profile.children.cycles-pp.__evlist__disable 0.11 ± 19% -0.0 0.07 ± 53% perf-profile.children.cycles-pp.vma_link 0.13 ± 6% -0.0 0.09 ± 27% perf-profile.children.cycles-pp.ptep_clear_flush 0.07 ± 7% -0.0 0.03 ±100% perf-profile.children.cycles-pp.__kernel_read 0.07 ± 7% -0.0 0.03 ±100% perf-profile.children.cycles-pp.simple_lookup 0.09 ± 9% +0.0 0.11 ± 10% perf-profile.children.cycles-pp.exit_notify 0.12 ± 14% +0.0 0.16 ± 17% perf-profile.children.cycles-pp.__do_set_cpus_allowed 0.02 ±141% +0.1 0.09 ± 40% perf-profile.children.cycles-pp.__sysvec_call_function 0.05 ± 78% +0.1 0.13 ± 42% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.03 ±141% +0.1 0.12 ± 41% perf-profile.children.cycles-pp.sysvec_call_function 0.64 ± 19% -0.2 0.45 ± 12% perf-profile.self.cycles-pp.task_mm_cid_work 0.07 ± 7% -0.0 0.03 ±100% perf-profile.self.cycles-pp.dequeue_task_fair 0.05 ± 8% +0.0 0.08 ± 14% perf-profile.self.cycles-pp.file_free_rcu 1057 +9.9% 1162 ± 2% perf-stat.i.MPKI 76.36 ± 2% +4.6 80.91 ± 2% perf-stat.i.cache-miss-rate% 5.353e+08 ± 4% +18.2% 6.327e+08 ± 3% perf-stat.i.cache-misses 7.576e+08 +9.3% 8.282e+08 ± 2% perf-stat.i.cache-references 3.727e+11 +1.7% 3.792e+11 perf-stat.i.cpu-cycles 154.73 +1.5% 157.11 perf-stat.i.cpu-migrations 722.61 ± 2% -8.9% 658.12 ± 3% perf-stat.i.cycles-between-cache-misses 2.91 +1.7% 2.96 perf-stat.i.metric.GHz 1242 ± 3% +5.7% 1312 ± 2% perf-stat.i.metric.K/sec 12.73 +9.8% 13.98 ± 2% perf-stat.i.metric.M/sec 245601 +5.4% 258749 perf-stat.i.node-load-misses 43.38 -2.5 40.91 ± 3% perf-stat.i.node-store-miss-rate% 2.267e+08 ± 3% +8.8% 2.467e+08 ± 4% perf-stat.i.node-store-misses 3.067e+08 ± 5% +25.2% 3.841e+08 ± 6% perf-stat.i.node-stores 915.00 +9.1% 998.24 ± 2% perf-stat.overall.MPKI 71.29 ± 3% +5.7 77.00 ± 3% perf-stat.overall.cache-miss-rate% 702.58 ± 3% -14.0% 604.23 ± 3% perf-stat.overall.cycles-between-cache-misses 42.48 ± 2% -3.3 39.20 ± 5% perf-stat.overall.node-store-miss-rate% 5.33e+08 ± 4% +18.1% 6.296e+08 ± 3% perf-stat.ps.cache-misses 7.475e+08 +9.4% 8.178e+08 ± 2% perf-stat.ps.cache-references 3.739e+11 +1.6% 3.8e+11 perf-stat.ps.cpu-cycles 154.22 +1.6% 156.62 perf-stat.ps.cpu-migrations 3655 +2.5% 3744 perf-stat.ps.minor-faults 242759 +5.4% 255974 perf-stat.ps.node-load-misses 2.255e+08 ± 3% +8.9% 2.457e+08 ± 3% perf-stat.ps.node-store-misses 3.057e+08 ± 5% +24.9% 3.82e+08 ± 6% perf-stat.ps.node-stores 3655 +2.5% 3744 perf-stat.ps.page-faults 1.968e+12 -8.3% 1.805e+12 ± 2% perf-stat.total.instructions 0.03 ±141% +283.8% 0.13 ± 85% perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 0.06 ± 77% +254.1% 0.20 ± 54% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 0.08 ± 28% -89.5% 0.01 ±223% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 0.92 ± 10% -33.4% 0.62 ± 20% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.10 ± 22% -27.2% 0.07 ± 8% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.35 ±141% +186.8% 1.02 ± 69% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 1.47 ± 81% +262.6% 5.32 ± 79% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 2.42 ± 42% +185.9% 6.91 ± 52% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 0.26 ± 9% +1470.7% 4.16 ±115% perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 3.61 ± 7% -25.3% 2.70 ± 18% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.08 ± 28% -89.5% 0.01 ±223% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 17.44 ± 4% -19.0% 14.12 ± 13% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 23.36 ± 21% -37.2% 14.67 ± 22% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 107.00 +11.5% 119.33 ± 4% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 75.00 +9.6% 82.17 ± 2% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 79.99 ± 97% -86.8% 10.52 ± 41% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single 145.98 ± 14% -41.5% 85.46 ± 22% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 1.20 ± 94% +152.3% 3.03 ± 31% perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup 2.30 ± 57% -90.9% 0.21 ±205% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part 0.06 ± 8% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary 0.58 ± 81% -76.6% 0.14 ± 50% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_lookupat.filename_lookup 2.63 ± 21% -59.4% 1.07 ± 68% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open 2.68 ± 40% -79.5% 0.55 ±174% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 3.59 ± 17% -52.9% 1.69 ± 98% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region 4.05 ± 2% -80.6% 0.79 ±133% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup 3.75 ± 19% -81.9% 0.68 ±135% perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 1527 ± 70% -84.5% 236.84 ±223% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop 16.13 ± 4% -21.4% 12.69 ± 15% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 1.16 ±117% -99.1% 0.01 ±223% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.26 ± 25% -93.2% 0.02 ±223% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm 22.43 ± 21% -37.4% 14.05 ± 22% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 4.41 ± 8% -94.9% 0.22 ±191% perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part 0.08 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary 6.20 ± 8% -21.6% 4.87 ± 13% perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 4.23 ± 5% -68.3% 1.34 ±136% perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read 3053 ± 70% -92.2% 236.84 ±223% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop 4.78 ± 33% +10431.5% 502.95 ± 99% perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 79.99 ± 97% -86.9% 10.51 ± 41% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single 2.13 ±128% -99.5% 0.01 ±223% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.26 ± 25% -92.4% 0.02 ±223% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm 142.79 ± 13% -40.9% 84.32 ± 22% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
I hope I can add your tested-by if I need to REBASE the patch for -mm tree depending on the feedback I get any further with any minor changes.