Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote:
>
>
>
> Hello,
>
> for this commit, we reported
> "[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
> in Aug, 2022 when it's in linux-next/master
> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>
> later, we reported
> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> in Oct, 2022 when it's in linus/master
> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@xxxxxxxxx/
>
> and the commit was reverted finally by
> commit 0ba09b1733878afe838fe35c310715fda3d46428
> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date:   Sun Dec 4 12:51:59 2022 -0800
>
> now we noticed it goes into linux-next/master again.
>
> we are not sure if there is an agreement that the benefit of this commit
> has already overweight performance drop in some mirco benchmark.
>
> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@xxxxxxxxxxxxxxxxxxxxxx/
> that
> "This patch was applied to v6.1, but was reverted due to a regression
> report.  However it turned out the regression was not due to this patch.
> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
> patch helps promote THP, so I rebased it onto the latest mm-unstable."

IIRC, Huang Ying's analysis showed the regression for will-it-scale
micro benchmark is fine, it was actually reverted due to kernel build
regression with LLVM reported by Nathan Chancellor. Then the
regression was resolved by commit
81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
if page in deferred queue already"). And this patch did improve kernel
build with GCC by ~3% if I remember correctly.

>
> however, unfortunately, in our latest tests, we still observed below regression
> upon this commit. just FYI.
>
>
>
> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:

Interesting, wasn't the same regression seen last time? And I'm a
little bit confused about how pthread got regressed. I didn't see the
pthread benchmark do any intensive memory alloc/free operations. Do
the pthread APIs do any intensive memory operations? I saw the
benchmark does allocate memory for thread stack, but it should be just
8K per thread, so it should not trigger what this patch does. With
1024 threads, the thread stacks may get merged into one single VMA (8M
total), but it may do so even though the patch is not applied.

>
>
> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: stress-ng
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> parameters:
>
>         nr_threads: 1
>         disk: 1HDD
>         testtime: 60s
>         fs: ext4
>         class: os
>         test: pthread
>         cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
> | test parameters  | array_size=50000000                                                                           |
> |                  | cpufreq_governor=performance                                                                  |
> |                  | iterations=10x                                                                                |
> |                  | loop=100                                                                                      |
> |                  | nr_threads=25%                                                                                |
> |                  | omp=true                                                                                      |
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> | test parameters  | cpufreq_governor=performance                                                                  |
> |                  | option_a=Average                                                                              |
> |                  | option_b=Integer                                                                              |
> |                  | test=ramspeed-1.4.3                                                                           |
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> | test parameters  | cpufreq_governor=performance                                                                  |
> |                  | option_a=Average                                                                              |
> |                  | option_b=Floating Point                                                                       |
> |                  | test=ramspeed-1.4.3                                                                           |
> +------------------+-----------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@xxxxxxxxx
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@xxxxxxxxx
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>   13405796           -65.5%    4620124        cpuidle..usage
>       8.00            +8.2%       8.66 ą  2%  iostat.cpu.system
>       1.61           -60.6%       0.63        iostat.cpu.user
>     597.50 ą 14%     -64.3%     213.50 ą 14%  perf-c2c.DRAM.local
>       1882 ą 14%     -74.7%     476.83 ą  7%  perf-c2c.HITM.local
>    3768436           -12.9%    3283395        vmstat.memory.cache
>     355105           -75.7%      86344 ą  3%  vmstat.system.cs
>     385435           -20.7%     305714 ą  3%  vmstat.system.in
>       1.13            -0.2        0.88        mpstat.cpu.all.irq%
>       0.29            -0.2        0.10 ą  2%  mpstat.cpu.all.soft%
>       6.76 ą  2%      +1.1        7.88 ą  2%  mpstat.cpu.all.sys%
>       1.62            -1.0        0.62 ą  2%  mpstat.cpu.all.usr%
>    2234397           -84.3%     350161 ą  5%  stress-ng.pthread.ops
>      37237           -84.3%       5834 ą  5%  stress-ng.pthread.ops_per_sec
>     294706 ą  2%     -68.0%      94191 ą  6%  stress-ng.time.involuntary_context_switches
>      41442 ą  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
>    4466457           -83.9%     717053 ą  5%  stress-ng.time.minor_page_faults

The larger RSS and fewer page faults are expected.

>     243.33           +13.5%     276.17 ą  3%  stress-ng.time.percent_of_cpu_this_job_got
>     131.64           +27.7%     168.11 ą  3%  stress-ng.time.system_time
>      19.73           -82.1%       3.53 ą  4%  stress-ng.time.user_time

Much less user time. And it seems to match the drop of the pthread metric.

>    7715609           -80.2%    1530125 ą  4%  stress-ng.time.voluntary_context_switches
>     494566           -59.5%     200338 ą  3%  meminfo.Active
>     478287           -61.5%     184050 ą  3%  meminfo.Active(anon)
>      58549 ą 17%   +1532.8%     956006 ą 14%  meminfo.AnonHugePages
>     424631          +194.9%    1252445 ą 10%  meminfo.AnonPages
>    3677263           -13.0%    3197755        meminfo.Cached
>    5829485 ą  4%     -19.0%    4724784 ą 10%  meminfo.Committed_AS
>     692486          +108.6%    1444669 ą  8%  meminfo.Inactive
>     662179          +113.6%    1414338 ą  9%  meminfo.Inactive(anon)
>     182416           -50.2%      90759        meminfo.Mapped
>    4614466           +10.0%    5076604 ą  2%  meminfo.Memused
>       6985           +47.6%      10307 ą  4%  meminfo.PageTables
>     718445           -66.7%     238913 ą  3%  meminfo.Shmem
>      35906           -20.7%      28471 ą  3%  meminfo.VmallocUsed
>    4838522           +25.6%    6075302        meminfo.max_used_kB
>     488.83           -20.9%     386.67 ą  2%  turbostat.Avg_MHz
>      12.95            -2.7       10.26 ą  2%  turbostat.Busy%
>    7156734           -87.2%     919149 ą  4%  turbostat.C1
>      10.59            -8.9        1.65 ą  5%  turbostat.C1%
>    3702647           -55.1%    1663518 ą  2%  turbostat.C1E
>      32.99           -20.6       12.36 ą  3%  turbostat.C1E%
>    1161078           +64.5%    1909611        turbostat.C6
>      44.25           +31.8       76.10        turbostat.C6%
>       0.18           -33.3%       0.12        turbostat.IPC
>   74338573 ą  2%     -33.9%   49159610 ą  4%  turbostat.IRQ
>    1381661           -91.0%     124075 ą  6%  turbostat.POLL
>       0.26            -0.2        0.04 ą 12%  turbostat.POLL%
>      96.15            -5.4%      90.95        turbostat.PkgWatt
>      12.12           +19.3%      14.46        turbostat.RAMWatt
>     119573           -61.5%      46012 ą  3%  proc-vmstat.nr_active_anon
>     106168          +195.8%     314047 ą 10%  proc-vmstat.nr_anon_pages
>      28.60 ą 17%   +1538.5%     468.68 ą 14%  proc-vmstat.nr_anon_transparent_hugepages
>     923365           -13.0%     803489        proc-vmstat.nr_file_pages
>     165571          +113.5%     353493 ą  9%  proc-vmstat.nr_inactive_anon
>      45605           -50.2%      22690        proc-vmstat.nr_mapped
>       1752           +47.1%       2578 ą  4%  proc-vmstat.nr_page_table_pages
>     179613           -66.7%      59728 ą  3%  proc-vmstat.nr_shmem
>      21490            -2.4%      20981        proc-vmstat.nr_slab_reclaimable
>      28260            -7.3%      26208        proc-vmstat.nr_slab_unreclaimable
>     119573           -61.5%      46012 ą  3%  proc-vmstat.nr_zone_active_anon
>     165570          +113.5%     353492 ą  9%  proc-vmstat.nr_zone_inactive_anon
>   17343640           -76.3%    4116748 ą  4%  proc-vmstat.numa_hit
>   17364975           -76.3%    4118098 ą  4%  proc-vmstat.numa_local
>     249252           -66.2%      84187 ą  2%  proc-vmstat.pgactivate
>   27528916          +567.1%  1.836e+08 ą  5%  proc-vmstat.pgalloc_normal
>    4912427           -79.2%    1019949 ą  3%  proc-vmstat.pgfault
>   27227124          +574.1%  1.835e+08 ą  5%  proc-vmstat.pgfree
>       8728         +3896.4%     348802 ą  5%  proc-vmstat.thp_deferred_split_page
>       8730         +3895.3%     348814 ą  5%  proc-vmstat.thp_fault_alloc
>       8728         +3896.4%     348802 ą  5%  proc-vmstat.thp_split_pmd
>     316745           -21.5%     248756 ą  4%  sched_debug.cfs_rq:/.avg_vruntime.avg
>     112735 ą  4%     -34.3%      74061 ą  6%  sched_debug.cfs_rq:/.avg_vruntime.min
>       0.49 ą  6%     -17.2%       0.41 ą  8%  sched_debug.cfs_rq:/.h_nr_running.stddev
>      12143 ą120%     -99.9%      15.70 ą116%  sched_debug.cfs_rq:/.left_vruntime.avg
>     414017 ą126%     -99.9%     428.50 ą102%  sched_debug.cfs_rq:/.left_vruntime.max
>      68492 ą125%     -99.9%      78.15 ą106%  sched_debug.cfs_rq:/.left_vruntime.stddev
>      41917 ą 24%     -48.3%      21690 ą 57%  sched_debug.cfs_rq:/.load.avg
>     176151 ą 30%     -56.9%      75963 ą 57%  sched_debug.cfs_rq:/.load.stddev
>       6489 ą 17%     -29.0%       4608 ą 12%  sched_debug.cfs_rq:/.load_avg.max
>       4.42 ą 45%     -81.1%       0.83 ą 74%  sched_debug.cfs_rq:/.load_avg.min
>       1112 ą 17%     -31.0%     767.62 ą 11%  sched_debug.cfs_rq:/.load_avg.stddev
>     316745           -21.5%     248756 ą  4%  sched_debug.cfs_rq:/.min_vruntime.avg
>     112735 ą  4%     -34.3%      74061 ą  6%  sched_debug.cfs_rq:/.min_vruntime.min
>       0.49 ą  6%     -17.2%       0.41 ą  8%  sched_debug.cfs_rq:/.nr_running.stddev
>      12144 ą120%     -99.9%      15.70 ą116%  sched_debug.cfs_rq:/.right_vruntime.avg
>     414017 ą126%     -99.9%     428.50 ą102%  sched_debug.cfs_rq:/.right_vruntime.max
>      68492 ą125%     -99.9%      78.15 ą106%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      14.25 ą 44%     -76.6%       3.33 ą 58%  sched_debug.cfs_rq:/.runnable_avg.min
>      11.58 ą 49%     -77.7%       2.58 ą 58%  sched_debug.cfs_rq:/.util_avg.min
>     423972 ą 23%     +59.3%     675379 ą  3%  sched_debug.cpu.avg_idle.avg
>       5720 ą 43%    +439.5%      30864        sched_debug.cpu.avg_idle.min
>      99.79 ą  2%     -23.7%      76.11 ą  2%  sched_debug.cpu.clock_task.stddev
>     162475 ą 49%     -95.8%       6813 ą 26%  sched_debug.cpu.curr->pid.avg
>    1061268           -84.0%     170212 ą  4%  sched_debug.cpu.curr->pid.max
>     365404 ą 20%     -91.3%      31839 ą 10%  sched_debug.cpu.curr->pid.stddev
>       0.51 ą  3%     -20.1%       0.41 ą  9%  sched_debug.cpu.nr_running.stddev
>     311923           -74.2%      80615 ą  2%  sched_debug.cpu.nr_switches.avg
>     565973 ą  4%     -77.8%     125597 ą 10%  sched_debug.cpu.nr_switches.max
>     192666 ą  4%     -70.6%      56695 ą  6%  sched_debug.cpu.nr_switches.min
>      67485 ą  8%     -79.9%      13558 ą 10%  sched_debug.cpu.nr_switches.stddev
>       2.62          +102.1%       5.30        perf-stat.i.MPKI
>   2.09e+09           -47.6%  1.095e+09 ą  4%  perf-stat.i.branch-instructions
>       1.56            -0.5        1.01        perf-stat.i.branch-miss-rate%
>   31951200           -60.9%   12481432 ą  2%  perf-stat.i.branch-misses
>      19.38           +23.7       43.08        perf-stat.i.cache-miss-rate%
>   26413597            -5.7%   24899132 ą  4%  perf-stat.i.cache-misses
>  1.363e+08           -58.3%   56906133 ą  4%  perf-stat.i.cache-references
>     370628           -75.8%      89743 ą  3%  perf-stat.i.context-switches
>       1.77           +65.1%       2.92 ą  2%  perf-stat.i.cpi
>  1.748e+10           -21.8%  1.367e+10 ą  2%  perf-stat.i.cpu-cycles
>      61611           -79.1%      12901 ą  6%  perf-stat.i.cpu-migrations
>     716.97 ą  2%     -17.2%     593.35 ą  2%  perf-stat.i.cycles-between-cache-misses
>       0.12 ą  4%      -0.1        0.05        perf-stat.i.dTLB-load-miss-rate%
>    3066100 ą  3%     -81.3%     573066 ą  5%  perf-stat.i.dTLB-load-misses
>  2.652e+09           -50.1%  1.324e+09 ą  4%  perf-stat.i.dTLB-loads
>       0.08 ą  2%      -0.0        0.03        perf-stat.i.dTLB-store-miss-rate%
>    1168195 ą  2%     -82.9%     199438 ą  5%  perf-stat.i.dTLB-store-misses
>  1.478e+09           -56.8%  6.384e+08 ą  3%  perf-stat.i.dTLB-stores
>    8080423           -73.2%    2169371 ą  3%  perf-stat.i.iTLB-load-misses
>    5601321           -74.3%    1440571 ą  2%  perf-stat.i.iTLB-loads
>  1.028e+10           -49.7%  5.173e+09 ą  4%  perf-stat.i.instructions
>       1450           +73.1%       2511 ą  2%  perf-stat.i.instructions-per-iTLB-miss
>       0.61           -35.9%       0.39        perf-stat.i.ipc
>       0.48           -21.4%       0.38 ą  2%  perf-stat.i.metric.GHz
>     616.28           -17.6%     507.69 ą  4%  perf-stat.i.metric.K/sec
>     175.16           -50.8%      86.18 ą  4%  perf-stat.i.metric.M/sec
>      76728           -80.8%      14724 ą  4%  perf-stat.i.minor-faults
>    5600408           -61.4%    2160997 ą  5%  perf-stat.i.node-loads
>    8873996           +52.1%   13499744 ą  5%  perf-stat.i.node-stores
>     112409           -81.9%      20305 ą  4%  perf-stat.i.page-faults
>       2.55           +89.6%       4.83        perf-stat.overall.MPKI

Much more TLB misses.

>       1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
>      19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
>       1.70           +56.4%       2.65        perf-stat.overall.cpi
>     665.84           -17.5%     549.51 ą  2%  perf-stat.overall.cycles-between-cache-misses
>       0.12 ą  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
>       0.08 ą  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
>      59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
>       1278           +86.1%       2379 ą  2%  perf-stat.overall.instructions-per-iTLB-miss
>       0.59           -36.1%       0.38        perf-stat.overall.ipc

Worse IPC and CPI.

>  2.078e+09           -48.3%  1.074e+09 ą  4%  perf-stat.ps.branch-instructions
>   31292687           -61.2%   12133349 ą  2%  perf-stat.ps.branch-misses
>   26057291            -5.9%   24512034 ą  4%  perf-stat.ps.cache-misses
>  1.353e+08           -58.6%   56072195 ą  4%  perf-stat.ps.cache-references
>     365254           -75.8%      88464 ą  3%  perf-stat.ps.context-switches
>  1.735e+10           -22.4%  1.346e+10 ą  2%  perf-stat.ps.cpu-cycles
>      60838           -79.1%      12727 ą  6%  perf-stat.ps.cpu-migrations
>    3056601 ą  4%     -81.5%     565354 ą  4%  perf-stat.ps.dTLB-load-misses
>  2.636e+09           -50.7%    1.3e+09 ą  4%  perf-stat.ps.dTLB-loads
>    1155253 ą  2%     -83.0%     196581 ą  5%  perf-stat.ps.dTLB-store-misses
>  1.473e+09           -57.4%  6.268e+08 ą  3%  perf-stat.ps.dTLB-stores
>    7997726           -73.3%    2131477 ą  3%  perf-stat.ps.iTLB-load-misses
>    5521346           -74.3%    1418623 ą  2%  perf-stat.ps.iTLB-loads
>  1.023e+10           -50.4%  5.073e+09 ą  4%  perf-stat.ps.instructions
>      75671           -80.9%      14479 ą  4%  perf-stat.ps.minor-faults
>    5549722           -61.4%    2141750 ą  4%  perf-stat.ps.node-loads
>    8769156           +51.6%   13296579 ą  5%  perf-stat.ps.node-stores
>     110795           -82.0%      19977 ą  4%  perf-stat.ps.page-faults
>  6.482e+11           -50.7%  3.197e+11 ą  4%  perf-stat.total.instructions
>       0.00 ą 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>       0.01 ą 18%   +8373.1%       0.73 ą 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       0.01 ą 16%   +4600.0%       0.38 ą 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit

More time spent in madvise and munmap. but I'm not sure whether this
is caused by tearing down the address space when exiting the test. If
so it should not count in the regression.

>       0.01 ą204%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.01 ą  8%   +3678.9%       0.36 ą 79%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
>       0.01 ą 14%     -38.5%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.01 ą  5%   +2946.2%       0.26 ą 43%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
>       0.00 ą 14%    +125.0%       0.01 ą 12%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.02 ą170%     -83.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00 ą 69%   +6578.6%       0.31 ą  4%  perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       0.00          +100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       0.02 ą 86%   +4234.4%       0.65 ą  4%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       0.01 ą  6%   +6054.3%       0.47        perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
>       0.00 ą 14%    +195.2%       0.01 ą 89%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.00 ą102%    +340.0%       0.01 ą 85%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       0.00          +100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       0.00 ą 11%     +66.7%       0.01 ą 21%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       0.01 ą 89%   +1096.1%       0.15 ą 30%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       0.00          +141.7%       0.01 ą 61%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.00 ą223%   +9975.0%       0.07 ą203%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
>       0.00 ą 10%    +789.3%       0.04 ą 69%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       0.00 ą 31%   +6691.3%       0.26 ą  5%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       0.00 ą 28%  +14612.5%       0.59 ą  4%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
>       0.00 ą 24%   +4904.2%       0.20 ą  4%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
>       0.00 ą 28%    +450.0%       0.01 ą 74%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
>       0.00 ą 17%    +984.6%       0.02 ą 79%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       0.00 ą 20%    +231.8%       0.01 ą 89%  perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait
>       0.00          +350.0%       0.01 ą 16%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       0.02 ą 16%    +320.2%       0.07 ą  2%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.02 ą  2%    +282.1%       0.09 ą  5%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.00 ą 14%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>       0.05 ą 35%   +3784.5%       1.92 ą 16%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       0.29 ą128%    +563.3%       1.92 ą  7%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>       0.14 ą217%     -99.7%       0.00 ą223%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.03 ą 49%     -74.0%       0.01 ą 51%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       0.01 ą 54%     -57.4%       0.00 ą 75%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
>       0.12 ą 21%    +873.0%       1.19 ą 60%  perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
>       2.27 ą220%     -99.7%       0.01 ą 19%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>       0.02 ą 36%     -54.4%       0.01 ą 55%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>       0.04 ą 36%     -77.1%       0.01 ą 31%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.12 ą 32%   +1235.8%       1.58 ą 31%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
>       2.25 ą218%     -99.3%       0.02 ą 52%  perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.01 ą 85%  +19836.4%       2.56 ą  7%  perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       0.03 ą 70%     -93.6%       0.00 ą223%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.10 ą 16%   +2984.2%       3.21 ą  6%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
>       0.01 ą 20%    +883.9%       0.05 ą177%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.01 ą 15%    +694.7%       0.08 ą123%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       0.00 ą223%   +6966.7%       0.07 ą199%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
>       0.01 ą 38%   +8384.6%       0.55 ą 72%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       0.01 ą 13%  +12995.7%       1.51 ą103%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>     117.80 ą 56%     -96.4%       4.26 ą 36%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.01 ą 68%    +331.9%       0.03        perf-sched.total_sch_delay.average.ms
>       4.14          +242.6%      14.20 ą  4%  perf-sched.total_wait_and_delay.average.ms
>     700841           -69.6%     212977 ą  3%  perf-sched.total_wait_and_delay.count.ms
>       4.14          +242.4%      14.16 ą  4%  perf-sched.total_wait_time.average.ms
>      11.68 ą  8%    +213.3%      36.59 ą 28%  perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>      10.00 ą  2%    +226.1%      32.62 ą 20%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>      10.55 ą  3%    +259.8%      37.96 ą  7%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>       9.80 ą 12%    +196.5%      29.07 ą 32%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>       9.80 ą  4%    +234.9%      32.83 ą 14%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>      10.32 ą  2%    +223.8%      33.42 ą  6%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>       8.15 ą 14%    +271.3%      30.25 ą 35%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>       9.60 ą  4%    +240.8%      32.73 ą 16%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>      10.37 ą  4%    +232.0%      34.41 ą 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       7.32 ą 46%    +269.7%      27.07 ą 49%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>       9.88          +236.2%      33.23 ą  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>       4.44 ą  4%    +379.0%      21.27 ą 18%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.05 ą  2%    +235.6%      33.73 ą 11%  perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.03          +462.6%       0.15 ą  6%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.78 ą  4%    +482.1%      39.46 ą  3%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       3.17          +683.3%      24.85 ą  8%  perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      36.64 ą 13%    +244.7%     126.32 ą  6%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       9.81          +302.4%      39.47 ą  4%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       1.05           +48.2%       1.56        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       0.93           +14.2%       1.06 ą  2%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
>       9.93          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
>      12.02 ą  3%    +139.8%      28.83 ą  6%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       6.09 ą  2%    +403.0%      30.64 ą  5%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      23.17 ą 19%     -83.5%       3.83 ą143%  perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
>      79.83 ą  9%     -55.1%      35.83 ą 16%  perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>      14.83 ą 14%     -59.6%       6.00 ą 56%  perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       8.50 ą 17%     -80.4%       1.67 ą 89%  perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
>     114.00 ą 14%     -62.4%      42.83 ą 11%  perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>      94.67 ą  7%     -48.1%      49.17 ą 13%  perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>      59.83 ą 13%     -76.0%      14.33 ą 48%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>     103.00 ą 12%     -48.1%      53.50 ą 20%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>      19.33 ą 16%     -56.0%       8.50 ą 29%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>      68.17 ą 11%     -39.1%      41.50 ą 19%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>      36.67 ą 22%     -79.1%       7.67 ą 46%  perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>     465.50 ą  9%     -47.4%     244.83 ą 11%  perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>      14492 ą  3%     -96.3%     533.67 ą 10%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>     128.67 ą  7%     -53.5%      59.83 ą 10%  perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.67 ą 34%     -80.4%       1.50 ą107%  perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree
>     147533           -81.0%      28023 ą  5%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       4394 ą  4%     -78.5%     942.83 ą  7%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>     228791           -79.3%      47383 ą  4%  perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
>     368.50 ą  2%     -67.1%     121.33 ą  3%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>     147506           -81.0%      28010 ą  5%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       5387 ą  6%     -16.7%       4488 ą  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       8303 ą  2%     -56.9%       3579 ą  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
>      14.67 ą  7%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
>     370.50 ą141%    +221.9%       1192 ą  5%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      24395 ą  2%     -51.2%      11914 ą  6%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      31053 ą  2%     -80.5%       6047 ą  5%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      16.41 ą  2%    +342.7%      72.65 ą 29%  perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>      16.49 ą  3%    +463.3%      92.90 ą 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>      17.32 ą  5%    +520.9%     107.52 ą 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>      15.38 ą  6%    +325.2%      65.41 ą 22%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>      16.73 ą  4%    +456.2%      93.04 ą 11%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>      17.14 ą  3%    +510.6%     104.68 ą 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>      15.70 ą  4%    +379.4%      75.25 ą 28%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>      15.70 ą  3%    +422.1%      81.97 ą 19%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>      16.38          +528.4%     102.91 ą 21%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>      45.20 ą 48%    +166.0%     120.23 ą 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      17.25          +495.5%     102.71 ą  2%  perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>     402.57 ą 15%     -52.8%     189.90 ą 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      16.96 ą  4%    +521.3%     105.40 ą 15%  perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      28.45          +517.3%     175.65 ą 14%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      22.49          +628.5%     163.83 ą 16%  perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      26.53 ą 30%    +326.9%     113.25 ą 16%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>      15.54          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
>       1.67 ą141%    +284.6%       6.44 ą  4%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       0.07 ą 34%     -93.6%       0.00 ą105%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
>      10.21 ą 15%    +295.8%      40.43 ą 50%  perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.89 ą 40%     -99.8%       0.01 ą113%  perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>      11.67 ą  8%    +213.5%      36.58 ą 28%  perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>       9.98 ą  2%    +226.8%      32.61 ą 20%  perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>       1.03           +71.2%       1.77 ą 20%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       0.06 ą 79%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
>       0.05 ą 22%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
>       0.08 ą 82%     -98.2%       0.00 ą223%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      10.72 ą 10%    +166.9%      28.61 ą 29%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>      10.53 ą  3%    +260.5%      37.95 ą  7%  perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>       9.80 ą 12%    +196.6%      29.06 ą 32%  perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>       9.80 ą  4%    +235.1%      32.82 ą 14%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>       9.50 ą 12%    +281.9%      36.27 ą 70%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>      10.31 ą  2%    +223.9%      33.40 ą  6%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>       8.04 ą 15%    +276.1%      30.25 ą 35%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>       9.60 ą  4%    +240.9%      32.72 ą 16%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>       0.06 ą 66%     -98.3%       0.00 ą223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
>      10.36 ą  4%    +232.1%      34.41 ą 10%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.08 ą 50%     -95.7%       0.00 ą100%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
>       0.01 ą 49%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
>       0.03 ą 73%     -87.4%       0.00 ą145%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
>       8.01 ą 25%    +238.0%      27.07 ą 49%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>       9.86          +237.0%      33.23 ą  4%  perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>       4.44 ą  4%    +379.2%      21.26 ą 18%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.03          +236.3%      33.73 ą 11%  perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.97 ą  8%     -87.8%       0.12 ą221%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.02 ą 13%   +1846.8%       0.45 ą 11%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       1.01           +64.7%       1.66        perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       0.75 ą  4%    +852.1%       7.10 ą  5%  perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.03          +462.6%       0.15 ą  6%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.24 ą  4%     +25.3%       0.30 ą  8%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       1.98 ą 15%    +595.7%      13.80 ą 90%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
>       2.78 ą 14%    +444.7%      15.12 ą 16%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
>       6.77 ą  4%    +483.0%      39.44 ą  3%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       3.17          +684.7%      24.85 ą  8%  perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      36.64 ą 13%    +244.7%     126.32 ą  6%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       9.79          +303.0%      39.45 ą  4%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       1.05           +23.8%       1.30        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       0.86          +101.2%       1.73 ą  3%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
>       0.11 ą 21%    +438.9%       0.61 ą 15%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
>       0.32 ą  4%     +28.5%       0.41 ą 13%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>      12.00 ą  3%    +139.6%      28.76 ą  6%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       6.07 ą  2%    +403.5%      30.56 ą  5%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.38 ą 41%     -98.8%       0.00 ą105%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
>       0.36 ą 34%     -84.3%       0.06 ą200%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page
>       0.36 ą 51%     -92.9%       0.03 ą114%  perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
>      15.98 ą  5%    +361.7%      73.80 ą 23%  perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.51 ą 14%     -92.8%       0.04 ą196%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range
>       8.56 ą 11%     -99.9%       0.01 ą126%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>       0.43 ą 32%     -68.2%       0.14 ą119%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range
>       0.46 ą 20%     -89.3%       0.05 ą184%  perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
>      16.40 ą  2%    +342.9%      72.65 ą 29%  perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>       0.31 ą 63%     -76.2%       0.07 ą169%  perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
>       0.14 ą 93%    +258.7%       0.49 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>      16.49 ą  3%    +463.5%      92.89 ą 27%  perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>       1.09          +171.0%       2.96 ą 10%  perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       1.16 ą  7%    +155.1%       2.97 ą  4%  perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>       0.19 ą 78%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
>       0.33 ą 35%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
>       0.20 ą101%     -99.3%       0.00 ą223%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      17.31 ą  5%    +521.0%     107.51 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>      15.38 ą  6%    +325.3%      65.40 ą 22%  perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>      16.72 ą  4%    +456.6%      93.04 ą 11%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>       1.16 ą  2%     +88.7%       2.20 ą 33%  perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
>      53.96 ą 32%    +444.0%     293.53 ą109%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>      17.13 ą  2%    +511.2%     104.68 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>      15.69 ą  4%    +379.5%      75.25 ą 28%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>      15.70 ą  3%    +422.2%      81.97 ą 19%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>       0.27 ą 80%     -99.6%       0.00 ą223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
>      16.37          +528.6%     102.90 ą 21%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.44 ą 33%     -99.1%       0.00 ą104%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
>       0.02 ą 83%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
>       0.08 ą 83%     -95.4%       0.00 ą147%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
>       1.16 ą  2%    +134.7%       2.72 ą 19%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
>      49.88 ą 25%    +141.0%     120.23 ą 27%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      17.24          +495.7%     102.70 ą  2%  perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>     402.56 ą 15%     -52.8%     189.89 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      16.96 ą  4%    +521.4%     105.39 ą 15%  perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.06          +241.7%       3.61 ą  4%  perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       1.07           -88.9%       0.12 ą221%  perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.28 ą 27%    +499.0%       1.67 ą 18%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       1.21 ą  2%    +207.2%       3.71 ą  3%  perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>      13.43 ą 26%     +38.8%      18.64        perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      28.45          +517.3%     175.65 ą 14%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.79 ą 10%     +62.2%       1.28 ą 25%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>      13.22 ą  2%    +317.2%      55.16 ą 35%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
>     834.29 ą 28%     -48.5%     429.53 ą 94%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>      22.48          +628.6%     163.83 ą 16%  perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      22.74 ą 18%    +398.0%     113.25 ą 16%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       7.72 ą  7%     +80.6%      13.95 ą  2%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
>       0.74 ą  4%     +77.2%       1.31 ą 32%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       5.01           +14.1%       5.72 ą  2%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      44.98           -19.7       25.32 ą  2%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
>      43.21           -19.6       23.65 ą  3%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
>      43.21           -19.6       23.65 ą  3%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      43.18           -19.5       23.63 ą  3%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      40.30           -17.5       22.75 ą  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      41.10           -17.4       23.66 ą  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>      39.55           -17.3       22.24 ą  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>      24.76 ą  2%      -8.5       16.23 ą  3%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       8.68 ą  4%      -6.5        2.22 ą  6%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       7.23 ą  4%      -5.8        1.46 ą  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>       7.23 ą  4%      -5.8        1.46 ą  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.11 ą  4%      -5.7        1.39 ą  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.09 ą  4%      -5.7        1.39 ą  7%  perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       5.76 ą  2%      -5.0        0.80 ą  9%  perf-profile.calltrace.cycles-pp.start_thread
>       7.43 ą  2%      -4.9        2.52 ą  7%  perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       5.51 ą  3%      -4.8        0.70 ą  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
>       5.50 ą  3%      -4.8        0.70 ą  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
>       5.48 ą  3%      -4.8        0.69 ą  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
>       5.42 ą  3%      -4.7        0.69 ą  7%  perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
>       5.90 ą  5%      -3.9        2.01 ą  4%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
>       4.18 ą  5%      -3.8        0.37 ą 71%  perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.76 ą  5%      -3.8        1.98 ą  4%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       5.04 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
>       5.03 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
>       5.02 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
>       5.02 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
>       5.62 ą  5%      -3.7        1.96 ą  3%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
>       4.03 ą  4%      -3.1        0.92 ą  7%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       6.03 ą  5%      -3.1        2.94 ą  3%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       3.43 ą  5%      -2.8        0.67 ą 13%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       3.43 ą  5%      -2.8        0.67 ą 13%  perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>       3.41 ą  5%      -2.7        0.66 ą 13%  perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
>       3.40 ą  5%      -2.7        0.66 ą 13%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
>       3.67 ą  7%      -2.7        0.94 ą 10%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.92 ą  7%      -2.4        0.50 ą 46%  perf-profile.calltrace.cycles-pp.stress_pthread
>       2.54 ą  6%      -2.2        0.38 ą 70%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       2.46 ą  6%      -1.8        0.63 ą 10%  perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>       3.00 ą  6%      -1.6        1.43 ą  7%  perf-profile.calltrace.cycles-pp.__munmap
>       2.96 ą  6%      -1.5        1.42 ą  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
>       2.96 ą  6%      -1.5        1.42 ą  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>       2.02 ą  4%      -1.5        0.52 ą 46%  perf-profile.calltrace.cycles-pp.__lll_lock_wait
>       1.78 ą  3%      -1.5        0.30 ą100%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
>       1.77 ą  3%      -1.5        0.30 ą100%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
>       1.54 ą  6%      -1.3        0.26 ą100%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       2.54 ą  6%      -1.2        1.38 ą  6%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.51 ą  6%      -1.1        1.37 ą  7%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>       1.13            -0.7        0.40 ą 70%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.15 ą  5%      -0.7        0.46 ą 45%  perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
>       1.58 ą  5%      -0.6        0.94 ą  7%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>       0.99 ą  5%      -0.5        0.51 ą 45%  perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
>       1.01 ą  5%      -0.5        0.54 ą 45%  perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
>       0.82 ą  4%      -0.2        0.59 ą  5%  perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
>       0.00            +0.5        0.54 ą  5%  perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
>       0.00            +0.6        0.60 ą  5%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       0.00            +0.6        0.61 ą  6%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
>       0.00            +0.6        0.62 ą  6%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       0.53 ą  5%      +0.6        1.17 ą 13%  perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>       1.94 ą  2%      +0.7        2.64 ą  9%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       0.00            +0.7        0.73 ą  5%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
>       0.00            +0.8        0.75 ą 20%  perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       2.02 ą  2%      +0.8        2.85 ą  9%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       0.74 ą  5%      +0.8        1.57 ą 11%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>       0.00            +0.9        0.90 ą  4%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.00            +0.9        0.92 ą 13%  perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
>       0.86 ą  4%      +1.0        1.82 ą 10%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
>       0.86 ą  4%      +1.0        1.83 ą 10%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
>       0.00            +1.0        0.98 ą  7%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked
>       0.09 ą223%      +1.0        1.07 ą 11%  perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
>       0.00            +1.0        0.99 ą  6%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd
>       0.00            +1.0        1.00 ą  7%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range
>       0.09 ą223%      +1.0        1.10 ą 12%  perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
>       0.00            +1.0        1.01 ą  6%  perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
>       0.00            +1.1        1.10 ą  5%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath
>       0.00            +1.1        1.12 ą  5%  perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock
>       0.00            +1.2        1.23 ą  4%  perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
>       0.00            +1.3        1.32 ą  4%  perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd
>       0.00            +1.4        1.38 ą  5%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
>       0.00            +2.4        2.44 ą 10%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range
>       0.00            +3.1        3.10 ą  5%  perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
>       0.00            +3.5        3.52 ą  5%  perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
>       0.88 ą  4%      +3.8        4.69 ą  4%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
>       6.30 ą  6%     +13.5       19.85 ą  7%  perf-profile.calltrace.cycles-pp.__clone
>       0.00           +16.7       16.69 ą  7%  perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       1.19 ą 29%     +17.1       18.32 ą  7%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>       0.00           +17.6       17.56 ą  7%  perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       0.63 ą  7%     +17.7       18.35 ą  7%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone
>       0.59 ą  5%     +17.8       18.34 ą  7%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone
>       0.59 ą  5%     +17.8       18.34 ą  7%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
>       0.00           +17.9       17.90 ą  7%  perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>       0.36 ą 71%     +18.0       18.33 ą  7%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
>       0.00           +32.0       32.03 ą  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range
>       0.00           +32.6       32.62 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
>       0.00           +36.2       36.19 ą  2%  perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
>       7.97 ą  4%     +36.6       44.52 ą  2%  perf-profile.calltrace.cycles-pp.__madvise
>       7.91 ą  4%     +36.6       44.46 ą  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
>       7.90 ą  4%     +36.6       44.46 ą  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       7.87 ą  4%     +36.6       44.44 ą  2%  perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       7.86 ą  4%     +36.6       44.44 ą  2%  perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       7.32 ą  4%     +36.8       44.07 ą  2%  perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.25 ą  4%     +36.8       44.06 ą  2%  perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
>       1.04 ą  4%     +40.0       41.08 ą  2%  perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       1.00 ą  3%     +40.1       41.06 ą  2%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>      44.98           -19.7       25.32 ą  2%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
>      44.98           -19.7       25.32 ą  2%  perf-profile.children.cycles-pp.cpu_startup_entry
>      44.96           -19.6       25.31 ą  2%  perf-profile.children.cycles-pp.do_idle
>      43.21           -19.6       23.65 ą  3%  perf-profile.children.cycles-pp.start_secondary
>      41.98           -17.6       24.40 ą  2%  perf-profile.children.cycles-pp.cpuidle_idle_call
>      41.21           -17.3       23.86 ą  2%  perf-profile.children.cycles-pp.cpuidle_enter
>      41.20           -17.3       23.86 ą  2%  perf-profile.children.cycles-pp.cpuidle_enter_state
>      12.69 ą  3%     -10.6        2.12 ą  6%  perf-profile.children.cycles-pp.do_exit
>      12.60 ą  3%     -10.5        2.08 ą  7%  perf-profile.children.cycles-pp.__x64_sys_exit
>      24.76 ą  2%      -8.5       16.31 ą  2%  perf-profile.children.cycles-pp.intel_idle
>      12.34 ą  2%      -8.4        3.90 ą  5%  perf-profile.children.cycles-pp.intel_idle_irq
>       6.96 ą  4%      -5.4        1.58 ą  7%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       6.69 ą  4%      -5.2        1.51 ą  7%  perf-profile.children.cycles-pp.ret_from_fork
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.children.cycles-pp.kthread
>       5.78 ą  2%      -5.0        0.80 ą  8%  perf-profile.children.cycles-pp.start_thread
>       4.68 ą  4%      -4.5        0.22 ą 10%  perf-profile.children.cycles-pp._raw_spin_lock_irq
>       5.03 ą  7%      -3.7        1.32 ą  9%  perf-profile.children.cycles-pp.__do_sys_clone
>       5.02 ą  7%      -3.7        1.32 ą  9%  perf-profile.children.cycles-pp.kernel_clone
>       4.20 ą  5%      -3.7        0.53 ą  9%  perf-profile.children.cycles-pp.exit_notify
>       4.67 ą  5%      -3.6        1.10 ą  9%  perf-profile.children.cycles-pp.rcu_core
>       4.60 ą  4%      -3.5        1.06 ą 10%  perf-profile.children.cycles-pp.rcu_do_batch
>       4.89 ą  5%      -3.4        1.44 ą 11%  perf-profile.children.cycles-pp.__do_softirq
>       5.64 ą  3%      -3.2        2.39 ą  6%  perf-profile.children.cycles-pp.__schedule
>       6.27 ą  5%      -3.2        3.03 ą  4%  perf-profile.children.cycles-pp.flush_tlb_mm_range
>       4.03 ą  4%      -3.1        0.92 ą  7%  perf-profile.children.cycles-pp.smpboot_thread_fn
>       6.68 ą  4%      -3.1        3.61 ą  3%  perf-profile.children.cycles-pp.tlb_finish_mmu
>       6.04 ą  5%      -3.1        2.99 ą  4%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
>       6.04 ą  5%      -3.0        2.99 ą  4%  perf-profile.children.cycles-pp.smp_call_function_many_cond
>       3.77 ą  2%      -3.0        0.73 ą 16%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       7.78            -3.0        4.77 ą  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       3.43 ą  5%      -2.8        0.67 ą 13%  perf-profile.children.cycles-pp.run_ksoftirqd
>       3.67 ą  7%      -2.7        0.94 ą 10%  perf-profile.children.cycles-pp.copy_process
>       2.80 ą  6%      -2.5        0.34 ą 15%  perf-profile.children.cycles-pp.queued_write_lock_slowpath
>       3.41 ą  2%      -2.5        0.96 ą 16%  perf-profile.children.cycles-pp.do_futex
>       3.06 ą  5%      -2.4        0.68 ą 16%  perf-profile.children.cycles-pp.free_unref_page_commit
>       3.02 ą  5%      -2.4        0.67 ą 16%  perf-profile.children.cycles-pp.free_pcppages_bulk
>       2.92 ą  7%      -2.3        0.58 ą 14%  perf-profile.children.cycles-pp.stress_pthread
>       3.22 ą  3%      -2.3        0.90 ą 18%  perf-profile.children.cycles-pp.__x64_sys_futex
>       2.52 ą  5%      -2.2        0.35 ą  7%  perf-profile.children.cycles-pp.release_task
>       2.54 ą  6%      -2.0        0.53 ą 10%  perf-profile.children.cycles-pp.worker_thread
>       3.12 ą  5%      -1.9        1.17 ą 11%  perf-profile.children.cycles-pp.free_unref_page
>       2.31 ą  6%      -1.9        0.45 ą 11%  perf-profile.children.cycles-pp.process_one_work
>       2.47 ą  6%      -1.8        0.63 ą 10%  perf-profile.children.cycles-pp.dup_task_struct
>       2.19 ą  5%      -1.8        0.41 ą 12%  perf-profile.children.cycles-pp.delayed_vfree_work
>       2.14 ą  5%      -1.7        0.40 ą 11%  perf-profile.children.cycles-pp.vfree
>       3.19 ą  2%      -1.6        1.58 ą  8%  perf-profile.children.cycles-pp.schedule
>       2.06 ą  3%      -1.6        0.46 ą  7%  perf-profile.children.cycles-pp.__sigtimedwait
>       3.02 ą  6%      -1.6        1.44 ą  7%  perf-profile.children.cycles-pp.__munmap
>       1.94 ą  4%      -1.6        0.39 ą 14%  perf-profile.children.cycles-pp.__unfreeze_partials
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.children.cycles-pp.__x64_sys_munmap
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.children.cycles-pp.__vm_munmap
>       2.14 ą  3%      -1.5        0.60 ą 21%  perf-profile.children.cycles-pp.futex_wait
>       2.08 ą  4%      -1.5        0.60 ą 19%  perf-profile.children.cycles-pp.__lll_lock_wait
>       2.04 ą  3%      -1.5        0.56 ą 20%  perf-profile.children.cycles-pp.__futex_wait
>       1.77 ą  5%      -1.5        0.32 ą 10%  perf-profile.children.cycles-pp.remove_vm_area
>       1.86 ą  5%      -1.4        0.46 ą 10%  perf-profile.children.cycles-pp.open64
>       1.74 ą  4%      -1.4        0.37 ą  7%  perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait
>       1.71 ą  4%      -1.4        0.36 ą  8%  perf-profile.children.cycles-pp.do_sigtimedwait
>       1.79 ą  5%      -1.3        0.46 ą  9%  perf-profile.children.cycles-pp.__x64_sys_openat
>       1.78 ą  5%      -1.3        0.46 ą  8%  perf-profile.children.cycles-pp.do_sys_openat2
>       1.61 ą  4%      -1.3        0.32 ą 12%  perf-profile.children.cycles-pp.poll_idle
>       1.65 ą  9%      -1.3        0.37 ą 14%  perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
>       1.56 ą  8%      -1.2        0.35 ą  7%  perf-profile.children.cycles-pp.alloc_thread_stack_node
>       2.32 ą  3%      -1.2        1.13 ą  8%  perf-profile.children.cycles-pp.pick_next_task_fair
>       2.59 ą  6%      -1.2        1.40 ą  7%  perf-profile.children.cycles-pp.do_vmi_munmap
>       1.55 ą  4%      -1.2        0.40 ą 19%  perf-profile.children.cycles-pp.futex_wait_queue
>       1.37 ą  5%      -1.1        0.22 ą 12%  perf-profile.children.cycles-pp.find_unlink_vmap_area
>       2.52 ą  6%      -1.1        1.38 ą  6%  perf-profile.children.cycles-pp.do_vmi_align_munmap
>       1.53 ą  5%      -1.1        0.39 ą  8%  perf-profile.children.cycles-pp.do_filp_open
>       1.52 ą  5%      -1.1        0.39 ą  7%  perf-profile.children.cycles-pp.path_openat
>       1.25 ą  3%      -1.1        0.14 ą 12%  perf-profile.children.cycles-pp.sigpending
>       1.58 ą  5%      -1.1        0.50 ą  6%  perf-profile.children.cycles-pp.schedule_idle
>       1.29 ą  5%      -1.1        0.21 ą 21%  perf-profile.children.cycles-pp.__mprotect
>       1.40 ą  8%      -1.1        0.32 ą  4%  perf-profile.children.cycles-pp.__vmalloc_node_range
>       2.06 ą  3%      -1.0        1.02 ą  9%  perf-profile.children.cycles-pp.newidle_balance
>       1.04 ą  3%      -1.0        0.08 ą 23%  perf-profile.children.cycles-pp.__x64_sys_rt_sigpending
>       1.14 ą  6%      -1.0        0.18 ą 18%  perf-profile.children.cycles-pp.__x64_sys_mprotect
>       1.13 ą  6%      -1.0        0.18 ą 17%  perf-profile.children.cycles-pp.do_mprotect_pkey
>       1.30 ą  7%      -0.9        0.36 ą 10%  perf-profile.children.cycles-pp.wake_up_new_task
>       1.14 ą  9%      -0.9        0.22 ą 16%  perf-profile.children.cycles-pp.do_anonymous_page
>       0.95 ą  3%      -0.9        0.04 ą 71%  perf-profile.children.cycles-pp.do_sigpending
>       1.24 ą  3%      -0.9        0.34 ą  9%  perf-profile.children.cycles-pp.futex_wake
>       1.02 ą  6%      -0.9        0.14 ą 15%  perf-profile.children.cycles-pp.mprotect_fixup
>       1.91 ą  2%      -0.9        1.06 ą  9%  perf-profile.children.cycles-pp.load_balance
>       1.38 ą  5%      -0.8        0.53 ą  6%  perf-profile.children.cycles-pp.select_task_rq_fair
>       1.14 ą  4%      -0.8        0.31 ą 12%  perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt
>       2.68 ą  3%      -0.8        1.91 ą  6%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>       1.00 ą  4%      -0.7        0.26 ą 10%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>       1.44 ą  3%      -0.7        0.73 ą 10%  perf-profile.children.cycles-pp.find_busiest_group
>       0.81 ą  6%      -0.7        0.10 ą 18%  perf-profile.children.cycles-pp.vma_modify
>       1.29 ą  3%      -0.7        0.60 ą  8%  perf-profile.children.cycles-pp.exit_mm
>       1.40 ą  3%      -0.7        0.71 ą 10%  perf-profile.children.cycles-pp.update_sd_lb_stats
>       0.78 ą  7%      -0.7        0.10 ą 19%  perf-profile.children.cycles-pp.__split_vma
>       0.90 ą  8%      -0.7        0.22 ą 10%  perf-profile.children.cycles-pp.__vmalloc_area_node
>       0.75 ą  4%      -0.7        0.10 ą  5%  perf-profile.children.cycles-pp.__exit_signal
>       1.49 ą  2%      -0.7        0.84 ą  7%  perf-profile.children.cycles-pp.try_to_wake_up
>       0.89 ą  7%      -0.6        0.24 ą 10%  perf-profile.children.cycles-pp.find_idlest_cpu
>       1.59 ą  5%      -0.6        0.95 ą  7%  perf-profile.children.cycles-pp.unmap_region
>       0.86 ą  3%      -0.6        0.22 ą 26%  perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2
>       1.59 ą  3%      -0.6        0.95 ą  9%  perf-profile.children.cycles-pp.irq_exit_rcu
>       1.24 ą  3%      -0.6        0.61 ą 10%  perf-profile.children.cycles-pp.update_sg_lb_stats
>       0.94 ą  5%      -0.6        0.32 ą 11%  perf-profile.children.cycles-pp.do_task_dead
>       0.87 ą  3%      -0.6        0.25 ą 19%  perf-profile.children.cycles-pp.perf_iterate_sb
>       0.82 ą  4%      -0.6        0.22 ą 10%  perf-profile.children.cycles-pp.sched_ttwu_pending
>       1.14 ą  3%      -0.6        0.54 ą 10%  perf-profile.children.cycles-pp.activate_task
>       0.84            -0.6        0.25 ą 10%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.81 ą  6%      -0.6        0.22 ą 11%  perf-profile.children.cycles-pp.find_idlest_group
>       0.75 ą  5%      -0.6        0.18 ą 14%  perf-profile.children.cycles-pp.step_into
>       0.74 ą  8%      -0.6        0.18 ą 14%  perf-profile.children.cycles-pp.__alloc_pages_bulk
>       0.74 ą  6%      -0.5        0.19 ą 11%  perf-profile.children.cycles-pp.update_sg_wakeup_stats
>       0.72 ą  5%      -0.5        0.18 ą 15%  perf-profile.children.cycles-pp.pick_link
>       1.06 ą  2%      -0.5        0.52 ą  9%  perf-profile.children.cycles-pp.enqueue_task_fair
>       0.77 ą  6%      -0.5        0.23 ą 12%  perf-profile.children.cycles-pp.unmap_vmas
>       0.76 ą  2%      -0.5        0.22 ą  8%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       0.94 ą  2%      -0.5        0.42 ą 10%  perf-profile.children.cycles-pp.dequeue_task_fair
>       0.65 ą  5%      -0.5        0.15 ą 18%  perf-profile.children.cycles-pp.open_last_lookups
>       1.37 ą  3%      -0.5        0.87 ą  4%  perf-profile.children.cycles-pp.llist_add_batch
>       0.70 ą  4%      -0.5        0.22 ą 19%  perf-profile.children.cycles-pp.memcpy_orig
>       0.91 ą  4%      -0.5        0.44 ą  7%  perf-profile.children.cycles-pp.update_load_avg
>       0.67            -0.5        0.20 ą  8%  perf-profile.children.cycles-pp.switch_fpu_return
>       0.88 ą  3%      -0.5        0.42 ą  8%  perf-profile.children.cycles-pp.enqueue_entity
>       0.91 ą  4%      -0.5        0.45 ą 12%  perf-profile.children.cycles-pp.ttwu_do_activate
>       0.77 ą  4%      -0.5        0.32 ą 10%  perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
>       0.63 ą  5%      -0.4        0.20 ą 21%  perf-profile.children.cycles-pp.arch_dup_task_struct
>       0.74 ą  3%      -0.4        0.32 ą 15%  perf-profile.children.cycles-pp.dequeue_entity
>       0.62 ą  5%      -0.4        0.21 ą  5%  perf-profile.children.cycles-pp.finish_task_switch
>       0.56            -0.4        0.16 ą  7%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
>       0.53 ą  4%      -0.4        0.13 ą  9%  perf-profile.children.cycles-pp.syscall
>       0.50 ą  9%      -0.4        0.11 ą 18%  perf-profile.children.cycles-pp.__get_vm_area_node
>       0.51 ą  3%      -0.4        0.12 ą 12%  perf-profile.children.cycles-pp.__slab_free
>       0.52 ą  2%      -0.4        0.14 ą 10%  perf-profile.children.cycles-pp.kmem_cache_free
>       0.75 ą  3%      -0.4        0.37 ą  9%  perf-profile.children.cycles-pp.exit_mm_release
>       0.50 ą  6%      -0.4        0.12 ą 21%  perf-profile.children.cycles-pp.do_send_specific
>       0.74 ą  3%      -0.4        0.37 ą  8%  perf-profile.children.cycles-pp.futex_exit_release
>       0.45 ą 10%      -0.4        0.09 ą 17%  perf-profile.children.cycles-pp.alloc_vmap_area
>       0.47 ą  3%      -0.4        0.11 ą 20%  perf-profile.children.cycles-pp.tgkill
>       0.68 ą 11%      -0.4        0.32 ą 12%  perf-profile.children.cycles-pp.__mmap
>       0.48 ą  3%      -0.4        0.13 ą  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.76 ą  5%      -0.3        0.41 ą 10%  perf-profile.children.cycles-pp.wake_up_q
>       0.42 ą  7%      -0.3        0.08 ą 22%  perf-profile.children.cycles-pp.__close
>       0.49 ą  7%      -0.3        0.14 ą 25%  perf-profile.children.cycles-pp.kmem_cache_alloc
>       0.49 ą  9%      -0.3        0.15 ą 14%  perf-profile.children.cycles-pp.mas_store_gfp
>       0.46 ą  4%      -0.3        0.12 ą 23%  perf-profile.children.cycles-pp.perf_event_task_output
>       0.44 ą 10%      -0.3        0.10 ą 28%  perf-profile.children.cycles-pp.pthread_sigqueue
>       0.46 ą  4%      -0.3        0.12 ą 15%  perf-profile.children.cycles-pp.link_path_walk
>       0.42 ą  8%      -0.3        0.10 ą 20%  perf-profile.children.cycles-pp.proc_ns_get_link
>       0.63 ą 10%      -0.3        0.32 ą 12%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>       0.45 ą  4%      -0.3        0.14 ą 13%  perf-profile.children.cycles-pp.sched_move_task
>       0.36 ą  8%      -0.3        0.06 ą 49%  perf-profile.children.cycles-pp.__x64_sys_close
>       0.46 ą  8%      -0.3        0.17 ą 14%  perf-profile.children.cycles-pp.prctl
>       0.65 ą  3%      -0.3        0.35 ą  7%  perf-profile.children.cycles-pp.futex_cleanup
>       0.42 ą  7%      -0.3        0.12 ą 15%  perf-profile.children.cycles-pp.mas_store_prealloc
>       0.49 ą  5%      -0.3        0.20 ą 13%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.37 ą  7%      -0.3        0.08 ą 16%  perf-profile.children.cycles-pp.do_tkill
>       0.36 ą 10%      -0.3        0.08 ą 20%  perf-profile.children.cycles-pp.ns_get_path
>       0.37 ą  4%      -0.3        0.09 ą 18%  perf-profile.children.cycles-pp.setns
>       0.67 ą  3%      -0.3        0.41 ą  8%  perf-profile.children.cycles-pp.hrtimer_wakeup
>       0.35 ą  5%      -0.3        0.10 ą 16%  perf-profile.children.cycles-pp.__task_pid_nr_ns
>       0.41 ą  5%      -0.3        0.16 ą 12%  perf-profile.children.cycles-pp.mas_wr_bnode
>       0.35 ą  4%      -0.3        0.10 ą 20%  perf-profile.children.cycles-pp.rcu_cblist_dequeue
>       0.37 ą  5%      -0.2        0.12 ą 17%  perf-profile.children.cycles-pp.exit_task_stack_account
>       0.56 ą  4%      -0.2        0.31 ą 12%  perf-profile.children.cycles-pp.select_task_rq
>       0.29 ą  6%      -0.2        0.05 ą 46%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.34 ą  4%      -0.2        0.10 ą 27%  perf-profile.children.cycles-pp.perf_event_task
>       0.39 ą  9%      -0.2        0.15 ą 12%  perf-profile.children.cycles-pp.__switch_to_asm
>       0.35 ą  5%      -0.2        0.11 ą 11%  perf-profile.children.cycles-pp.account_kernel_stack
>       0.30 ą  7%      -0.2        0.06 ą 48%  perf-profile.children.cycles-pp.__ns_get_path
>       0.31 ą  9%      -0.2        0.07 ą 17%  perf-profile.children.cycles-pp.free_vmap_area_noflush
>       0.31 ą  5%      -0.2        0.08 ą 19%  perf-profile.children.cycles-pp.__do_sys_setns
>       0.33 ą  7%      -0.2        0.10 ą  7%  perf-profile.children.cycles-pp.__free_one_page
>       0.31 ą 11%      -0.2        0.08 ą 13%  perf-profile.children.cycles-pp.__pte_alloc
>       0.36 ą  6%      -0.2        0.13 ą 12%  perf-profile.children.cycles-pp.switch_mm_irqs_off
>       0.27 ą 12%      -0.2        0.05 ą 71%  perf-profile.children.cycles-pp.__fput
>       0.53 ą  9%      -0.2        0.31 ą 12%  perf-profile.children.cycles-pp.do_mmap
>       0.27 ą 12%      -0.2        0.05 ą 77%  perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo
>       0.28 ą  5%      -0.2        0.06 ą 50%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.34 ą 10%      -0.2        0.12 ą 29%  perf-profile.children.cycles-pp.futex_wait_setup
>       0.27 ą  6%      -0.2        0.06 ą 45%  perf-profile.children.cycles-pp.__x64_sys_tgkill
>       0.31 ą  7%      -0.2        0.11 ą 18%  perf-profile.children.cycles-pp.__switch_to
>       0.26 ą  8%      -0.2        0.06 ą 21%  perf-profile.children.cycles-pp.__call_rcu_common
>       0.33 ą  9%      -0.2        0.13 ą 18%  perf-profile.children.cycles-pp.__do_sys_prctl
>       0.28 ą  5%      -0.2        0.08 ą 17%  perf-profile.children.cycles-pp.mm_release
>       0.52 ą  2%      -0.2        0.32 ą  9%  perf-profile.children.cycles-pp.__get_user_8
>       0.24 ą 10%      -0.2        0.04 ą 72%  perf-profile.children.cycles-pp.dput
>       0.25 ą 14%      -0.2        0.05 ą 46%  perf-profile.children.cycles-pp.perf_event_mmap
>       0.24 ą  7%      -0.2        0.06 ą 50%  perf-profile.children.cycles-pp.mas_walk
>       0.28 ą  6%      -0.2        0.10 ą 24%  perf-profile.children.cycles-pp.rmqueue_bulk
>       0.23 ą 15%      -0.2        0.05 ą 46%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       0.25 ą 15%      -0.2        0.08 ą 45%  perf-profile.children.cycles-pp.___slab_alloc
>       0.20 ą 14%      -0.2        0.03 ą100%  perf-profile.children.cycles-pp.lookup_fast
>       0.20 ą 10%      -0.2        0.04 ą 75%  perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
>       0.28 ą  7%      -0.2        0.12 ą 24%  perf-profile.children.cycles-pp.prepare_task_switch
>       0.22 ą 11%      -0.2        0.05 ą  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
>       0.63 ą  5%      -0.2        0.47 ą 12%  perf-profile.children.cycles-pp.llist_reverse_order
>       0.25 ą 11%      -0.2        0.09 ą 34%  perf-profile.children.cycles-pp.futex_q_lock
>       0.21 ą  6%      -0.2        0.06 ą 47%  perf-profile.children.cycles-pp.kmem_cache_alloc_node
>       0.18 ą 11%      -0.2        0.03 ą100%  perf-profile.children.cycles-pp.alloc_empty_file
>       0.19 ą  5%      -0.2        0.04 ą 71%  perf-profile.children.cycles-pp.__put_task_struct
>       0.19 ą 15%      -0.2        0.03 ą 70%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
>       0.24 ą  6%      -0.2        0.09 ą 20%  perf-profile.children.cycles-pp.___perf_sw_event
>       0.18 ą  7%      -0.2        0.03 ą100%  perf-profile.children.cycles-pp.perf_event_fork
>       0.19 ą 11%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.select_idle_core
>       0.30 ą 11%      -0.1        0.15 ą  7%  perf-profile.children.cycles-pp.pte_alloc_one
>       0.25 ą  6%      -0.1        0.11 ą 10%  perf-profile.children.cycles-pp.set_next_entity
>       0.20 ą 10%      -0.1        0.06 ą 49%  perf-profile.children.cycles-pp.__perf_event_header__init_id
>       0.18 ą 15%      -0.1        0.03 ą101%  perf-profile.children.cycles-pp.__radix_tree_lookup
>       0.22 ą 11%      -0.1        0.08 ą 21%  perf-profile.children.cycles-pp.mas_spanning_rebalance
>       0.20 ą  9%      -0.1        0.06 ą  9%  perf-profile.children.cycles-pp.stress_pthread_func
>       0.18 ą 12%      -0.1        0.04 ą 73%  perf-profile.children.cycles-pp.__getpid
>       0.16 ą 13%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.walk_component
>       0.28 ą  5%      -0.1        0.15 ą 13%  perf-profile.children.cycles-pp.update_curr
>       0.25 ą  5%      -0.1        0.11 ą 22%  perf-profile.children.cycles-pp.balance_fair
>       0.16 ą  9%      -0.1        0.03 ą100%  perf-profile.children.cycles-pp.futex_wake_mark
>       0.16 ą 12%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.get_futex_key
>       0.17 ą  6%      -0.1        0.05 ą 47%  perf-profile.children.cycles-pp.memcg_account_kmem
>       0.25 ą 11%      -0.1        0.12 ą 11%  perf-profile.children.cycles-pp._find_next_bit
>       0.15 ą 13%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.do_open
>       0.20 ą  8%      -0.1        0.08 ą 16%  perf-profile.children.cycles-pp.mas_rebalance
>       0.17 ą 13%      -0.1        0.05 ą 45%  perf-profile.children.cycles-pp.__memcg_kmem_charge_page
>       0.33 ą  6%      -0.1        0.21 ą 10%  perf-profile.children.cycles-pp.select_idle_sibling
>       0.14 ą 11%      -0.1        0.03 ą100%  perf-profile.children.cycles-pp.get_user_pages_fast
>       0.18 ą  7%      -0.1        0.07 ą 14%  perf-profile.children.cycles-pp.mas_alloc_nodes
>       0.14 ą 11%      -0.1        0.03 ą101%  perf-profile.children.cycles-pp.set_task_cpu
>       0.14 ą 12%      -0.1        0.03 ą101%  perf-profile.children.cycles-pp.vm_unmapped_area
>       0.38 ą  6%      -0.1        0.27 ą  7%  perf-profile.children.cycles-pp.native_sched_clock
>       0.16 ą 10%      -0.1        0.05 ą 47%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
>       0.36 ą  9%      -0.1        0.25 ą 12%  perf-profile.children.cycles-pp.mmap_region
>       0.23 ą  7%      -0.1        0.12 ą  9%  perf-profile.children.cycles-pp.available_idle_cpu
>       0.13 ą 11%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.internal_get_user_pages_fast
>       0.16 ą 10%      -0.1        0.06 ą 18%  perf-profile.children.cycles-pp.get_unmapped_area
>       0.50 ą  7%      -0.1        0.40 ą  6%  perf-profile.children.cycles-pp.menu_select
>       0.24 ą  9%      -0.1        0.14 ą 13%  perf-profile.children.cycles-pp.rmqueue
>       0.17 ą 14%      -0.1        0.07 ą 26%  perf-profile.children.cycles-pp.perf_event_comm
>       0.17 ą 15%      -0.1        0.07 ą 23%  perf-profile.children.cycles-pp.perf_event_comm_event
>       0.17 ą 11%      -0.1        0.07 ą 14%  perf-profile.children.cycles-pp.pick_next_entity
>       0.13 ą 14%      -0.1        0.03 ą102%  perf-profile.children.cycles-pp.perf_output_begin
>       0.23 ą  6%      -0.1        0.13 ą 21%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
>       0.14 ą 18%      -0.1        0.04 ą 72%  perf-profile.children.cycles-pp.perf_event_comm_output
>       0.21 ą  9%      -0.1        0.12 ą  9%  perf-profile.children.cycles-pp.update_rq_clock
>       0.16 ą  8%      -0.1        0.06 ą 19%  perf-profile.children.cycles-pp.mas_split
>       0.13 ą 14%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
>       0.13 ą  6%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.syscall_return_via_sysret
>       0.13 ą  7%      -0.1        0.04 ą 72%  perf-profile.children.cycles-pp.mas_topiary_replace
>       0.14 ą  8%      -0.1        0.06 ą  9%  perf-profile.children.cycles-pp.mas_preallocate
>       0.16 ą 11%      -0.1        0.07 ą 18%  perf-profile.children.cycles-pp.__pick_eevdf
>       0.11 ą 14%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.mas_empty_area_rev
>       0.25 ą  7%      -0.1        0.17 ą 10%  perf-profile.children.cycles-pp.select_idle_cpu
>       0.14 ą 12%      -0.1        0.06 ą 14%  perf-profile.children.cycles-pp.cpu_stopper_thread
>       0.14 ą 10%      -0.1        0.06 ą 13%  perf-profile.children.cycles-pp.active_load_balance_cpu_stop
>       0.14 ą 14%      -0.1        0.06 ą 11%  perf-profile.children.cycles-pp.os_xsave
>       0.18 ą  6%      -0.1        0.11 ą 14%  perf-profile.children.cycles-pp.idle_cpu
>       0.17 ą  4%      -0.1        0.10 ą 15%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
>       0.11 ą 14%      -0.1        0.03 ą100%  perf-profile.children.cycles-pp.__pthread_mutex_lock
>       0.32 ą  5%      -0.1        0.25 ą  5%  perf-profile.children.cycles-pp.sched_clock
>       0.11 ą  6%      -0.1        0.03 ą 70%  perf-profile.children.cycles-pp.wakeup_preempt
>       0.23 ą  7%      -0.1        0.16 ą 13%  perf-profile.children.cycles-pp.update_rq_clock_task
>       0.13 ą  8%      -0.1        0.06 ą 16%  perf-profile.children.cycles-pp.local_clock_noinstr
>       0.11 ą 10%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
>       0.34 ą  4%      -0.1        0.27 ą  6%  perf-profile.children.cycles-pp.sched_clock_cpu
>       0.11 ą  9%      -0.1        0.04 ą 76%  perf-profile.children.cycles-pp.avg_vruntime
>       0.15 ą  8%      -0.1        0.08 ą 14%  perf-profile.children.cycles-pp.update_cfs_group
>       0.10 ą  8%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
>       0.13 ą  8%      -0.1        0.06 ą 11%  perf-profile.children.cycles-pp.sched_use_asym_prio
>       0.09 ą 12%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.getname_flags
>       0.18 ą  9%      -0.1        0.12 ą 12%  perf-profile.children.cycles-pp.__update_load_avg_se
>       0.11 ą  8%      -0.1        0.05 ą 46%  perf-profile.children.cycles-pp.place_entity
>       0.08 ą 12%      -0.0        0.02 ą 99%  perf-profile.children.cycles-pp.folio_add_lru_vma
>       0.10 ą  7%      -0.0        0.05 ą 46%  perf-profile.children.cycles-pp._find_next_and_bit
>       0.10 ą  6%      -0.0        0.06 ą 24%  perf-profile.children.cycles-pp.reweight_entity
>       0.03 ą 70%      +0.0        0.08 ą 14%  perf-profile.children.cycles-pp.perf_rotate_context
>       0.19 ą 10%      +0.1        0.25 ą  7%  perf-profile.children.cycles-pp.irqtime_account_irq
>       0.08 ą 11%      +0.1        0.14 ą 21%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>       0.00            +0.1        0.06 ą 14%  perf-profile.children.cycles-pp.rcu_pending
>       0.10 ą 17%      +0.1        0.16 ą 13%  perf-profile.children.cycles-pp.rebalance_domains
>       0.14 ą 16%      +0.1        0.21 ą 12%  perf-profile.children.cycles-pp.downgrade_write
>       0.14 ą 14%      +0.1        0.21 ą 10%  perf-profile.children.cycles-pp.down_read_killable
>       0.00            +0.1        0.07 ą 11%  perf-profile.children.cycles-pp.free_tail_page_prepare
>       0.02 ą141%      +0.1        0.09 ą 20%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
>       0.01 ą223%      +0.1        0.08 ą 25%  perf-profile.children.cycles-pp.arch_scale_freq_tick
>       0.55 ą  9%      +0.1        0.62 ą  9%  perf-profile.children.cycles-pp.__alloc_pages
>       0.34 ą  5%      +0.1        0.41 ą  9%  perf-profile.children.cycles-pp.clock_nanosleep
>       0.00            +0.1        0.08 ą 23%  perf-profile.children.cycles-pp.tick_nohz_next_event
>       0.70 ą  2%      +0.1        0.78 ą  5%  perf-profile.children.cycles-pp.flush_tlb_func
>       0.14 ą 10%      +0.1        0.23 ą 13%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
>       0.07 ą 19%      +0.1        0.17 ą 17%  perf-profile.children.cycles-pp.cgroup_rstat_updated
>       0.04 ą 71%      +0.1        0.14 ą 11%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
>       0.25 ą  9%      +0.1        0.38 ą 11%  perf-profile.children.cycles-pp.down_read
>       0.43 ą  9%      +0.1        0.56 ą 10%  perf-profile.children.cycles-pp.get_page_from_freelist
>       0.00            +0.1        0.15 ą  6%  perf-profile.children.cycles-pp.vm_normal_page
>       0.31 ą  7%      +0.2        0.46 ą  9%  perf-profile.children.cycles-pp.native_flush_tlb_local
>       0.00            +0.2        0.16 ą  8%  perf-profile.children.cycles-pp.__tlb_remove_page_size
>       0.28 ą 11%      +0.2        0.46 ą 13%  perf-profile.children.cycles-pp.vma_alloc_folio
>       0.00            +0.2        0.24 ą  5%  perf-profile.children.cycles-pp._compound_head
>       0.07 ą 16%      +0.2        0.31 ą  6%  perf-profile.children.cycles-pp.__mod_node_page_state
>       0.38 ą  5%      +0.2        0.62 ą  7%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
>       0.22 ą 12%      +0.2        0.47 ą 10%  perf-profile.children.cycles-pp.schedule_preempt_disabled
>       0.38 ą  5%      +0.3        0.64 ą  7%  perf-profile.children.cycles-pp.perf_event_task_tick
>       0.00            +0.3        0.27 ą  5%  perf-profile.children.cycles-pp.free_swap_cache
>       0.30 ą 10%      +0.3        0.58 ą 10%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
>       0.00            +0.3        0.30 ą  4%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
>       0.09 ą 10%      +0.3        0.42 ą  7%  perf-profile.children.cycles-pp.__mod_lruvec_state
>       0.00            +0.3        0.34 ą  9%  perf-profile.children.cycles-pp.deferred_split_folio
>       0.00            +0.4        0.36 ą 13%  perf-profile.children.cycles-pp.prep_compound_page
>       0.09 ą 10%      +0.4        0.50 ą  9%  perf-profile.children.cycles-pp.free_unref_page_prepare
>       0.00            +0.4        0.42 ą 11%  perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
>       1.67 ą  3%      +0.4        2.12 ą  8%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>       0.63 ą  3%      +0.5        1.11 ą 12%  perf-profile.children.cycles-pp.scheduler_tick
>       1.93 ą  3%      +0.5        2.46 ą  8%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       1.92 ą  3%      +0.5        2.45 ą  8%  perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.73 ą  3%      +0.6        1.31 ą 11%  perf-profile.children.cycles-pp.update_process_times
>       0.74 ą  3%      +0.6        1.34 ą 11%  perf-profile.children.cycles-pp.tick_sched_handle
>       0.20 ą  8%      +0.6        0.83 ą 18%  perf-profile.children.cycles-pp.__cond_resched
>       0.78 ą  4%      +0.6        1.43 ą 12%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
>       0.12 ą  7%      +0.7        0.81 ą  5%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
>       0.28 ą  7%      +0.9        1.23 ą  4%  perf-profile.children.cycles-pp.release_pages
>       0.00            +1.0        1.01 ą  6%  perf-profile.children.cycles-pp.pmdp_invalidate
>       0.35 ą  6%      +1.2        1.56 ą  5%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
>       0.30 ą  8%      +1.2        1.53 ą  4%  perf-profile.children.cycles-pp.tlb_batch_pages_flush
>       0.00            +1.3        1.26 ą  4%  perf-profile.children.cycles-pp.page_add_anon_rmap
>       0.09 ą 11%      +3.1        3.20 ą  5%  perf-profile.children.cycles-pp.page_remove_rmap
>       1.60 ą  2%      +3.4        5.04 ą  4%  perf-profile.children.cycles-pp.zap_pte_range
>       0.03 ą100%      +3.5        3.55 ą  5%  perf-profile.children.cycles-pp.__split_huge_pmd_locked
>      41.36           +11.6       52.92 ą  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      41.22           +11.7       52.88 ą  2%  perf-profile.children.cycles-pp.do_syscall_64
>       6.42 ą  6%     +13.5       19.88 ą  7%  perf-profile.children.cycles-pp.__clone
>       0.82 ą  6%     +16.2       16.98 ą  7%  perf-profile.children.cycles-pp.clear_page_erms
>       2.62 ą  5%     +16.4       19.04 ą  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       2.18 ą  5%     +16.8       18.94 ą  7%  perf-profile.children.cycles-pp.exc_page_fault
>       2.06 ą  6%     +16.8       18.90 ą  7%  perf-profile.children.cycles-pp.do_user_addr_fault
>       1.60 ą  8%     +17.0       18.60 ą  7%  perf-profile.children.cycles-pp.handle_mm_fault
>       1.52 ą  7%     +17.1       18.58 ą  7%  perf-profile.children.cycles-pp.__handle_mm_fault
>       0.30 ą  7%     +17.4       17.72 ą  7%  perf-profile.children.cycles-pp.clear_huge_page
>       0.31 ą  8%     +17.6       17.90 ą  7%  perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
>      11.66 ą  3%     +22.2       33.89 ą  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       3.29 ą  3%     +30.2       33.46        perf-profile.children.cycles-pp._raw_spin_lock
>       0.04 ą 71%     +36.2       36.21 ą  2%  perf-profile.children.cycles-pp.__split_huge_pmd
>       8.00 ą  4%     +36.5       44.54 ą  2%  perf-profile.children.cycles-pp.__madvise
>       7.87 ą  4%     +36.6       44.44 ą  2%  perf-profile.children.cycles-pp.__x64_sys_madvise
>       7.86 ą  4%     +36.6       44.44 ą  2%  perf-profile.children.cycles-pp.do_madvise
>       7.32 ą  4%     +36.8       44.07 ą  2%  perf-profile.children.cycles-pp.madvise_vma_behavior
>       7.26 ą  4%     +36.8       44.06 ą  2%  perf-profile.children.cycles-pp.zap_page_range_single
>       1.78           +39.5       41.30 ą  2%  perf-profile.children.cycles-pp.unmap_page_range
>       1.72           +39.6       41.28 ą  2%  perf-profile.children.cycles-pp.zap_pmd_range
>      24.76 ą  2%      -8.5       16.31 ą  2%  perf-profile.self.cycles-pp.intel_idle
>      11.46 ą  2%      -7.8        3.65 ą  5%  perf-profile.self.cycles-pp.intel_idle_irq
>       3.16 ą  7%      -2.1        1.04 ą  6%  perf-profile.self.cycles-pp.smp_call_function_many_cond
>       1.49 ą  4%      -1.2        0.30 ą 12%  perf-profile.self.cycles-pp.poll_idle
>       1.15 ą  3%      -0.6        0.50 ą  9%  perf-profile.self.cycles-pp._raw_spin_lock
>       0.60 ą  6%      -0.6        0.03 ą100%  perf-profile.self.cycles-pp.queued_write_lock_slowpath
>       0.69 ą  4%      -0.5        0.22 ą 20%  perf-profile.self.cycles-pp.memcpy_orig
>       0.66 ą  7%      -0.5        0.18 ą 11%  perf-profile.self.cycles-pp.update_sg_wakeup_stats
>       0.59 ą  4%      -0.5        0.13 ą  8%  perf-profile.self.cycles-pp._raw_spin_lock_irq
>       0.86 ą  3%      -0.4        0.43 ą 12%  perf-profile.self.cycles-pp.update_sg_lb_stats
>       0.56            -0.4        0.16 ą  7%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
>       0.48 ą  3%      -0.4        0.12 ą 10%  perf-profile.self.cycles-pp.__slab_free
>       1.18 ą  2%      -0.4        0.82 ą  3%  perf-profile.self.cycles-pp.llist_add_batch
>       0.54 ą  5%      -0.3        0.19 ą  6%  perf-profile.self.cycles-pp.__schedule
>       0.47 ą  7%      -0.3        0.18 ą 13%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.34 ą  5%      -0.2        0.09 ą 18%  perf-profile.self.cycles-pp.kmem_cache_free
>       0.43 ą  4%      -0.2        0.18 ą 11%  perf-profile.self.cycles-pp.update_load_avg
>       0.35 ą  4%      -0.2        0.10 ą 23%  perf-profile.self.cycles-pp.rcu_cblist_dequeue
>       0.38 ą  9%      -0.2        0.15 ą 10%  perf-profile.self.cycles-pp.__switch_to_asm
>       0.33 ą  5%      -0.2        0.10 ą 16%  perf-profile.self.cycles-pp.__task_pid_nr_ns
>       0.36 ą  6%      -0.2        0.13 ą 14%  perf-profile.self.cycles-pp.switch_mm_irqs_off
>       0.31 ą  6%      -0.2        0.09 ą  6%  perf-profile.self.cycles-pp.__free_one_page
>       0.28 ą  5%      -0.2        0.06 ą 50%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.27 ą 13%      -0.2        0.06 ą 23%  perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
>       0.30 ą  7%      -0.2        0.10 ą 19%  perf-profile.self.cycles-pp.__switch_to
>       0.27 ą  4%      -0.2        0.10 ą 17%  perf-profile.self.cycles-pp.finish_task_switch
>       0.23 ą  7%      -0.2        0.06 ą 50%  perf-profile.self.cycles-pp.mas_walk
>       0.22 ą  9%      -0.2        0.05 ą 48%  perf-profile.self.cycles-pp.__clone
>       0.63 ą  5%      -0.2        0.46 ą 12%  perf-profile.self.cycles-pp.llist_reverse_order
>       0.20 ą  4%      -0.2        0.04 ą 72%  perf-profile.self.cycles-pp.entry_SYSCALL_64
>       0.24 ą 10%      -0.1        0.09 ą 19%  perf-profile.self.cycles-pp.rmqueue_bulk
>       0.18 ą 13%      -0.1        0.03 ą101%  perf-profile.self.cycles-pp.__radix_tree_lookup
>       0.18 ą 11%      -0.1        0.04 ą 71%  perf-profile.self.cycles-pp.stress_pthread_func
>       0.36 ą  8%      -0.1        0.22 ą 11%  perf-profile.self.cycles-pp.menu_select
>       0.22 ą  4%      -0.1        0.08 ą 19%  perf-profile.self.cycles-pp.___perf_sw_event
>       0.20 ą 13%      -0.1        0.07 ą 20%  perf-profile.self.cycles-pp.start_thread
>       0.16 ą 13%      -0.1        0.03 ą101%  perf-profile.self.cycles-pp.alloc_vmap_area
>       0.17 ą 10%      -0.1        0.04 ą 73%  perf-profile.self.cycles-pp.kmem_cache_alloc
>       0.14 ą  9%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.futex_wake
>       0.17 ą  4%      -0.1        0.06 ą 11%  perf-profile.self.cycles-pp.dequeue_task_fair
>       0.23 ą  6%      -0.1        0.12 ą 11%  perf-profile.self.cycles-pp.available_idle_cpu
>       0.22 ą 13%      -0.1        0.11 ą 12%  perf-profile.self.cycles-pp._find_next_bit
>       0.21 ą  7%      -0.1        0.10 ą  6%  perf-profile.self.cycles-pp.__rmqueue_pcplist
>       0.37 ą  7%      -0.1        0.26 ą  8%  perf-profile.self.cycles-pp.native_sched_clock
>       0.22 ą  7%      -0.1        0.12 ą 21%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
>       0.19 ą  7%      -0.1        0.10 ą 11%  perf-profile.self.cycles-pp.enqueue_entity
>       0.15 ą  5%      -0.1        0.06 ą 45%  perf-profile.self.cycles-pp.enqueue_task_fair
>       0.15 ą 11%      -0.1        0.06 ą 17%  perf-profile.self.cycles-pp.__pick_eevdf
>       0.13 ą 13%      -0.1        0.05 ą 72%  perf-profile.self.cycles-pp.prepare_task_switch
>       0.17 ą 10%      -0.1        0.08 ą  8%  perf-profile.self.cycles-pp.update_rq_clock_task
>       0.54 ą  4%      -0.1        0.46 ą  6%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
>       0.14 ą 14%      -0.1        0.06 ą 11%  perf-profile.self.cycles-pp.os_xsave
>       0.11 ą 10%      -0.1        0.03 ą 70%  perf-profile.self.cycles-pp.try_to_wake_up
>       0.10 ą  8%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.futex_wait
>       0.14 ą  9%      -0.1        0.07 ą 10%  perf-profile.self.cycles-pp.update_curr
>       0.18 ą  9%      -0.1        0.11 ą 14%  perf-profile.self.cycles-pp.idle_cpu
>       0.11 ą 11%      -0.1        0.04 ą 76%  perf-profile.self.cycles-pp.avg_vruntime
>       0.15 ą 10%      -0.1        0.08 ą 14%  perf-profile.self.cycles-pp.update_cfs_group
>       0.09 ą  9%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.reweight_entity
>       0.12 ą 13%      -0.1        0.06 ą  8%  perf-profile.self.cycles-pp.do_idle
>       0.18 ą 10%      -0.1        0.12 ą 13%  perf-profile.self.cycles-pp.__update_load_avg_se
>       0.09 ą 17%      -0.1        0.04 ą 71%  perf-profile.self.cycles-pp.cpuidle_idle_call
>       0.10 ą 11%      -0.0        0.06 ą 45%  perf-profile.self.cycles-pp.update_rq_clock
>       0.12 ą 15%      -0.0        0.07 ą 16%  perf-profile.self.cycles-pp.update_sd_lb_stats
>       0.09 ą  5%      -0.0        0.05 ą 46%  perf-profile.self.cycles-pp._find_next_and_bit
>       0.01 ą223%      +0.1        0.08 ą 25%  perf-profile.self.cycles-pp.arch_scale_freq_tick
>       0.78 ą  4%      +0.1        0.87 ą  4%  perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
>       0.14 ą 10%      +0.1        0.23 ą 13%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
>       0.06 ą 46%      +0.1        0.15 ą 19%  perf-profile.self.cycles-pp.cgroup_rstat_updated
>       0.19 ą  3%      +0.1        0.29 ą  4%  perf-profile.self.cycles-pp.cpuidle_enter_state
>       0.00            +0.1        0.10 ą 11%  perf-profile.self.cycles-pp.__mod_lruvec_state
>       0.00            +0.1        0.11 ą 18%  perf-profile.self.cycles-pp.__tlb_remove_page_size
>       0.00            +0.1        0.12 ą  9%  perf-profile.self.cycles-pp.vm_normal_page
>       0.23 ą  7%      +0.1        0.36 ą  8%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
>       0.20 ą  8%      +0.2        0.35 ą  7%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
>       1.12 ą  2%      +0.2        1.28 ą  4%  perf-profile.self.cycles-pp.zap_pte_range
>       0.31 ą  8%      +0.2        0.46 ą  9%  perf-profile.self.cycles-pp.native_flush_tlb_local
>       0.00            +0.2        0.16 ą  5%  perf-profile.self.cycles-pp._compound_head
>       0.06 ą 17%      +0.2        0.26 ą  4%  perf-profile.self.cycles-pp.__mod_node_page_state
>       0.00            +0.2        0.24 ą  6%  perf-profile.self.cycles-pp.free_swap_cache
>       0.00            +0.3        0.27 ą 15%  perf-profile.self.cycles-pp.clear_huge_page
>       0.00            +0.3        0.27 ą 11%  perf-profile.self.cycles-pp.deferred_split_folio
>       0.00            +0.4        0.36 ą 13%  perf-profile.self.cycles-pp.prep_compound_page
>       0.05 ą 47%      +0.4        0.43 ą  9%  perf-profile.self.cycles-pp.free_unref_page_prepare
>       0.08 ą  7%      +0.5        0.57 ą 23%  perf-profile.self.cycles-pp.__cond_resched
>       0.08 ą 12%      +0.5        0.58 ą  5%  perf-profile.self.cycles-pp.release_pages
>       0.10 ą 10%      +0.5        0.63 ą  6%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
>       0.00            +1.1        1.11 ą  7%  perf-profile.self.cycles-pp.__split_huge_pmd_locked
>       0.00            +1.2        1.18 ą  4%  perf-profile.self.cycles-pp.page_add_anon_rmap
>       0.03 ą101%      +1.3        1.35 ą  7%  perf-profile.self.cycles-pp.page_remove_rmap
>       0.82 ą  5%     +16.1       16.88 ą  7%  perf-profile.self.cycles-pp.clear_page_erms
>      11.65 ą  3%     +20.2       31.88 ą  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
> ***************************************************************************************************
> lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
>   50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>      10.50 ą 14%     +55.6%      16.33 ą 16%  perf-c2c.DRAM.local
>       6724           -11.4%       5954 ą  2%  vmstat.system.cs
>  2.746e+09           +16.7%  3.205e+09 ą  2%  cpuidle..time
>    2771516           +16.0%    3213723 ą  2%  cpuidle..usage
>       0.06 ą  4%      -0.0        0.05 ą  5%  mpstat.cpu.all.soft%
>       0.47 ą  2%      -0.1        0.39 ą  2%  mpstat.cpu.all.sys%
>       0.01 ą 85%   +1700.0%       0.20 ą188%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
>      15.11 ą 13%     -28.8%      10.76 ą 34%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      15.09 ą 13%     -30.3%      10.51 ą 38%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>    1023952           +13.4%    1161219        meminfo.AnonHugePages
>    1319741           +10.8%    1461995        meminfo.AnonPages
>    1331039           +11.2%    1480149        meminfo.Inactive
>    1330865           +11.2%    1479975        meminfo.Inactive(anon)
>    1266202           +16.0%    1469399 ą  2%  turbostat.C1E
>    1509871           +16.6%    1760853 ą  2%  turbostat.C6
>    3521203           +17.4%    4134075 ą  3%  turbostat.IRQ
>     580.32            -3.8%     558.30        turbostat.PkgWatt
>      77.42           -14.0%      66.60 ą  2%  turbostat.RAMWatt
>     330416           +10.8%     366020        proc-vmstat.nr_anon_pages
>     500.90           +13.4%     567.99        proc-vmstat.nr_anon_transparent_hugepages
>     333197           +11.2%     370536        proc-vmstat.nr_inactive_anon
>     333197           +11.2%     370536        proc-vmstat.nr_zone_inactive_anon
>     129879 ą 11%     -46.7%      69207 ą 12%  proc-vmstat.numa_pages_migrated
>    3879028            +5.9%    4109180        proc-vmstat.pgalloc_normal
>    3403414            +6.6%    3628929        proc-vmstat.pgfree
>     129879 ą 11%     -46.7%      69207 ą 12%  proc-vmstat.pgmigrate_success
>       5763            +9.8%       6327        proc-vmstat.thp_fault_alloc
>     350993           -15.6%     296081 ą  2%  stream.add_bandwidth_MBps
>     349830           -16.1%     293492 ą  2%  stream.add_bandwidth_MBps_harmonicMean
>     333973           -20.5%     265439 ą  3%  stream.copy_bandwidth_MBps
>     332930           -21.7%     260548 ą  3%  stream.copy_bandwidth_MBps_harmonicMean
>     302788           -16.2%     253817 ą  2%  stream.scale_bandwidth_MBps
>     302157           -17.1%     250577 ą  2%  stream.scale_bandwidth_MBps_harmonicMean
>    1177276            +9.3%    1286614        stream.time.maximum_resident_set_size
>       5038            +1.1%       5095        stream.time.percent_of_cpu_this_job_got
>     694.19 ą  2%     +19.5%     829.85 ą  2%  stream.time.user_time
>     339047           -12.1%     298061        stream.triad_bandwidth_MBps
>     338186           -12.4%     296218        stream.triad_bandwidth_MBps_harmonicMean
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode
>       0.84 ą103%      +1.7        2.57 ą 59%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       0.84 ą103%      +1.7        2.57 ą 59%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       0.31 ą223%      +2.0        2.33 ą 44%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
>       0.31 ą223%      +2.0        2.33 ą 44%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
>       3.07 ą 56%      +2.8        5.88 ą 28%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       8.42 ą100%      -8.4        0.00        perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.1        0.36 ą223%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
>      12.32 ą 25%      -6.6        5.69 ą 69%  perf-profile.children.cycles-pp.vsnprintf
>      12.76 ą 27%      -6.6        6.19 ą 67%  perf-profile.children.cycles-pp.seq_printf
>       3.07 ą 56%      +2.8        5.88 ą 28%  perf-profile.children.cycles-pp.__x64_sys_exit_group
>      40.11           -11.0%      35.71 ą  2%  perf-stat.i.MPKI
>  1.563e+10           -12.3%  1.371e+10 ą  2%  perf-stat.i.branch-instructions
>  3.721e+09 ą  2%     -23.2%  2.858e+09 ą  4%  perf-stat.i.cache-misses
>  4.471e+09 ą  3%     -22.7%  3.458e+09 ą  4%  perf-stat.i.cache-references
>       5970 ą  5%     -15.9%       5021 ą  4%  perf-stat.i.context-switches
>       1.66 ą  2%     +15.8%       1.92 ą  2%  perf-stat.i.cpi
>      41.83 ą  4%     +30.6%      54.63 ą  4%  perf-stat.i.cycles-between-cache-misses
>  2.282e+10 ą  2%     -14.5%  1.952e+10 ą  2%  perf-stat.i.dTLB-loads
>     572602 ą  3%      -9.2%     519922 ą  5%  perf-stat.i.dTLB-store-misses
>  1.483e+10 ą  2%     -15.7%   1.25e+10 ą  2%  perf-stat.i.dTLB-stores
>  9.179e+10           -13.7%  7.924e+10 ą  2%  perf-stat.i.instructions
>       0.61           -13.4%       0.52 ą  2%  perf-stat.i.ipc
>     373.79 ą  4%     -37.8%     232.60 ą  9%  perf-stat.i.metric.K/sec
>     251.45           -13.4%     217.72 ą  2%  perf-stat.i.metric.M/sec
>      21446 ą  3%     -24.1%      16278 ą  8%  perf-stat.i.minor-faults
>      15.07 ą  5%      -6.0        9.10 ą 10%  perf-stat.i.node-load-miss-rate%
>   68275790 ą  5%     -44.9%   37626128 ą 12%  perf-stat.i.node-load-misses
>      21448 ą  3%     -24.1%      16281 ą  8%  perf-stat.i.page-faults
>      40.71           -11.3%      36.10 ą  2%  perf-stat.overall.MPKI
>       1.67           +15.3%       1.93 ą  2%  perf-stat.overall.cpi
>      41.07 ą  3%     +30.1%      53.42 ą  4%  perf-stat.overall.cycles-between-cache-misses
>       0.00 ą  2%      +0.0        0.00 ą  2%  perf-stat.overall.dTLB-store-miss-rate%
>       0.60           -13.2%       0.52 ą  2%  perf-stat.overall.ipc
>      15.19 ą  5%      -6.2        9.03 ą 11%  perf-stat.overall.node-load-miss-rate%
>    1.4e+10            -9.3%  1.269e+10        perf-stat.ps.branch-instructions
>  3.352e+09 ą  3%     -20.9%  2.652e+09 ą  4%  perf-stat.ps.cache-misses
>  4.026e+09 ą  3%     -20.3%  3.208e+09 ą  4%  perf-stat.ps.cache-references
>       4888 ą  4%     -10.8%       4362 ą  3%  perf-stat.ps.context-switches
>     206092            +2.1%     210375        perf-stat.ps.cpu-clock
>  1.375e+11            +2.8%  1.414e+11        perf-stat.ps.cpu-cycles
>     258.23 ą  5%      +8.8%     280.85 ą  4%  perf-stat.ps.cpu-migrations
>  2.048e+10           -11.7%  1.809e+10 ą  2%  perf-stat.ps.dTLB-loads
>  1.333e+10 ą  2%     -13.0%   1.16e+10 ą  2%  perf-stat.ps.dTLB-stores
>  8.231e+10           -10.8%  7.342e+10        perf-stat.ps.instructions
>      15755 ą  3%     -16.3%      13187 ą  6%  perf-stat.ps.minor-faults
>   61706790 ą  6%     -43.8%   34699716 ą 11%  perf-stat.ps.node-load-misses
>      15757 ą  3%     -16.3%      13189 ą  6%  perf-stat.ps.page-faults
>     206092            +2.1%     210375        perf-stat.ps.task-clock
>  1.217e+12            +4.1%  1.267e+12 ą  2%  perf-stat.total.instructions
>
>
>
> ***************************************************************************************************
> lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     232.12 ą  7%     -12.0%     204.18 ą  8%  sched_debug.cfs_rq:/.load_avg.stddev
>       6797            -3.3%       6576        vmstat.system.cs
>      15161            -0.9%      15029        vmstat.system.in
>     349927           +44.3%     504820        meminfo.AnonHugePages
>     507807           +27.1%     645169        meminfo.AnonPages
>    1499332           +10.2%    1652612        meminfo.Inactive(anon)
>       8.67 ą 62%    +184.6%      24.67 ą 25%  turbostat.C10
>       1.50            -0.1        1.45        turbostat.C1E%
>       3.30            -3.2%       3.20        turbostat.RAMWatt
>       1.40 ą 14%      -0.3        1.09 ą 13%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
>       1.44 ą 12%      -0.3        1.12 ą 13%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       0.03 ą141%      +0.1        0.10 ą 30%  perf-profile.children.cycles-pp.next_uptodate_folio
>       0.02 ą141%      +0.1        0.10 ą 22%  perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
>       0.02 ą143%      +0.1        0.10 ą 25%  perf-profile.self.cycles-pp.next_uptodate_folio
>       0.01 ą223%      +0.1        0.09 ą 19%  perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
>      19806            -3.5%      19109        phoronix-test-suite.ramspeed.Average.Integer.mb_s
>     283.70            +3.8%     294.50        phoronix-test-suite.time.elapsed_time
>     283.70            +3.8%     294.50        phoronix-test-suite.time.elapsed_time.max
>     120454            +1.6%     122334        phoronix-test-suite.time.maximum_resident_set_size
>     281337           -54.8%     127194        phoronix-test-suite.time.minor_page_faults
>     259.13            +4.1%     269.81        phoronix-test-suite.time.user_time
>     126951           +27.0%     161291        proc-vmstat.nr_anon_pages
>     170.86           +44.3%     246.49        proc-vmstat.nr_anon_transparent_hugepages
>     355917            -1.0%     352250        proc-vmstat.nr_dirty_background_threshold
>     712705            -1.0%     705362        proc-vmstat.nr_dirty_threshold
>    3265201            -1.1%    3228465        proc-vmstat.nr_free_pages
>     374833           +10.2%     413153        proc-vmstat.nr_inactive_anon
>       1767            +4.8%       1853        proc-vmstat.nr_page_table_pages
>     374833           +10.2%     413153        proc-vmstat.nr_zone_inactive_anon
>     854665           -34.3%     561406        proc-vmstat.numa_hit
>     854632           -34.3%     561397        proc-vmstat.numa_local
>    5548755            +1.1%    5610598        proc-vmstat.pgalloc_normal
>    1083315           -26.2%     799129        proc-vmstat.pgfault
>     113425            +3.7%     117656        proc-vmstat.pgreuse
>       9025            +7.6%       9714        proc-vmstat.thp_fault_alloc
>       3.38            +0.1        3.45        perf-stat.i.branch-miss-rate%
>  4.135e+08            -3.2%  4.003e+08        perf-stat.i.cache-misses
>  5.341e+08            -2.7%  5.197e+08        perf-stat.i.cache-references
>       6832            -3.4%       6600        perf-stat.i.context-switches
>       4.06            +3.1%       4.19        perf-stat.i.cpi
>     438639 ą  5%     -18.7%     356730 ą  6%  perf-stat.i.dTLB-load-misses
>  1.119e+09            -3.8%  1.077e+09        perf-stat.i.dTLB-loads
>       0.02 ą 15%      -0.0        0.01 ą 26%  perf-stat.i.dTLB-store-miss-rate%
>      80407 ą 10%     -63.5%      29387 ą 23%  perf-stat.i.dTLB-store-misses
>  7.319e+08            -3.8%  7.043e+08        perf-stat.i.dTLB-stores
>      57.72            +0.8       58.52        perf-stat.i.iTLB-load-miss-rate%
>     129846            -3.8%     124973        perf-stat.i.iTLB-load-misses
>     144448            -5.3%     136837        perf-stat.i.iTLB-loads
>  2.389e+09            -3.5%  2.305e+09        perf-stat.i.instructions
>       0.28            -2.9%       0.27        perf-stat.i.ipc
>     220.59            -3.4%     213.11        perf-stat.i.metric.M/sec
>       3610           -31.2%       2483        perf-stat.i.minor-faults
>   49238342            +1.1%   49776834        perf-stat.i.node-loads
>   98106028            -3.1%   95018390        perf-stat.i.node-stores
>       3615           -31.2%       2487        perf-stat.i.page-faults
>       3.65            +3.7%       3.78        perf-stat.overall.cpi
>      21.08            +3.3%      21.79        perf-stat.overall.cycles-between-cache-misses
>       0.04 ą  5%      -0.0        0.03 ą  6%  perf-stat.overall.dTLB-load-miss-rate%
>       0.01 ą 10%      -0.0        0.00 ą 23%  perf-stat.overall.dTLB-store-miss-rate%
>       0.27            -3.6%       0.26        perf-stat.overall.ipc
>  4.122e+08            -3.2%   3.99e+08        perf-stat.ps.cache-misses
>  5.324e+08            -2.7%  5.181e+08        perf-stat.ps.cache-references
>       6809            -3.4%       6580        perf-stat.ps.context-switches
>     437062 ą  5%     -18.7%     355481 ą  6%  perf-stat.ps.dTLB-load-misses
>  1.115e+09            -3.8%  1.073e+09        perf-stat.ps.dTLB-loads
>      80134 ą 10%     -63.5%      29283 ą 23%  perf-stat.ps.dTLB-store-misses
>  7.295e+08            -3.8%  7.021e+08        perf-stat.ps.dTLB-stores
>     129362            -3.7%     124535        perf-stat.ps.iTLB-load-misses
>     143865            -5.2%     136338        perf-stat.ps.iTLB-loads
>  2.381e+09            -3.5%  2.297e+09        perf-stat.ps.instructions
>       3596           -31.2%       2473        perf-stat.ps.minor-faults
>   49081949            +1.1%   49621463        perf-stat.ps.node-loads
>   97795918            -3.1%   94724831        perf-stat.ps.node-stores
>       3600           -31.2%       2477        perf-stat.ps.page-faults
>
>
>
> ***************************************************************************************************
> lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     167.28 ą  5%     -13.1%     145.32 ą  6%  sched_debug.cfs_rq:/.util_est_enqueued.avg
>       6845            -2.5%       6674        vmstat.system.cs
>     351910 ą  2%     +40.2%     493341        meminfo.AnonHugePages
>     505908           +27.2%     643328        meminfo.AnonPages
>    1497656           +10.2%    1650453        meminfo.Inactive(anon)
>      18957 ą 13%     +26.3%      23947 ą 17%  turbostat.C1
>       1.52            -0.0        1.48        turbostat.C1E%
>       3.32            -2.9%       3.23        turbostat.RAMWatt
>      19978            -3.0%      19379        phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
>     280.71            +3.3%     289.93        phoronix-test-suite.time.elapsed_time
>     280.71            +3.3%     289.93        phoronix-test-suite.time.elapsed_time.max
>     120465            +1.5%     122257        phoronix-test-suite.time.maximum_resident_set_size
>     281047           -54.7%     127190        phoronix-test-suite.time.minor_page_faults
>     257.03            +3.5%     265.95        phoronix-test-suite.time.user_time
>     126473           +27.2%     160831        proc-vmstat.nr_anon_pages
>     171.83 ą  2%     +40.2%     240.89        proc-vmstat.nr_anon_transparent_hugepages
>     355973            -1.0%     352304        proc-vmstat.nr_dirty_background_threshold
>     712818            -1.0%     705471        proc-vmstat.nr_dirty_threshold
>    3265800            -1.1%    3228879        proc-vmstat.nr_free_pages
>     374410           +10.2%     412613        proc-vmstat.nr_inactive_anon
>       1770            +4.4%       1848        proc-vmstat.nr_page_table_pages
>     374410           +10.2%     412613        proc-vmstat.nr_zone_inactive_anon
>     852082           -34.9%     555093        proc-vmstat.numa_hit
>     852125           -34.9%     555018        proc-vmstat.numa_local
>    1078293           -26.6%     791038        proc-vmstat.pgfault
>     112693            +2.9%     116004        proc-vmstat.pgreuse
>       9025            +7.6%       9713        proc-vmstat.thp_fault_alloc
>       3.63 ą  6%      +0.6        4.25 ą  9%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>       0.25 ą 55%      -0.2        0.08 ą 68%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       0.25 ą 55%      -0.2        0.08 ą 68%  perf-profile.children.cycles-pp.ret_from_fork
>       0.23 ą 56%      -0.2        0.07 ą 69%  perf-profile.children.cycles-pp.kthread
>       0.14 ą 36%      -0.1        0.05 ą120%  perf-profile.children.cycles-pp.do_anonymous_page
>       0.14 ą 35%      -0.1        0.05 ą 76%  perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
>       0.04 ą 72%      +0.0        0.08 ą 19%  perf-profile.children.cycles-pp.try_to_wake_up
>       0.04 ą118%      +0.1        0.10 ą 36%  perf-profile.children.cycles-pp.update_rq_clock
>       0.07 ą 79%      +0.1        0.17 ą 21%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       7.99 ą 11%      +1.0        9.02 ą  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.23 ą 28%      -0.1        0.14 ą 49%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
>       0.14 ą 35%      -0.1        0.05 ą 76%  perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
>       0.06 ą 79%      +0.1        0.16 ą 21%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.21 ą 34%      +0.2        0.36 ą 18%  perf-profile.self.cycles-pp.ktime_get
>  1.187e+08            -4.6%  1.133e+08        perf-stat.i.branch-instructions
>       3.36            +0.1        3.42        perf-stat.i.branch-miss-rate%
>    5492420            -3.9%    5275592        perf-stat.i.branch-misses
>  4.148e+08            -2.8%  4.034e+08        perf-stat.i.cache-misses
>  5.251e+08            -2.6%  5.114e+08        perf-stat.i.cache-references
>       6880            -2.5%       6711        perf-stat.i.context-switches
>       4.30            +2.9%       4.43        perf-stat.i.cpi
>       0.10 ą  7%      -0.0        0.09 ą  2%  perf-stat.i.dTLB-load-miss-rate%
>     472268 ą  6%     -19.9%     378489        perf-stat.i.dTLB-load-misses
>  8.107e+08            -3.4%  7.831e+08        perf-stat.i.dTLB-loads
>       0.02 ą 16%      -0.0        0.01 ą  2%  perf-stat.i.dTLB-store-miss-rate%
>      90535 ą 11%     -59.8%      36371 ą  2%  perf-stat.i.dTLB-store-misses
>  5.323e+08            -3.3%  5.145e+08        perf-stat.i.dTLB-stores
>     129981            -3.0%     126061        perf-stat.i.iTLB-load-misses
>     143662            -3.1%     139223        perf-stat.i.iTLB-loads
>  2.253e+09            -3.6%  2.172e+09        perf-stat.i.instructions
>       0.26            -3.2%       0.25        perf-stat.i.ipc
>       4.71 ą  2%      -6.4%       4.41 ą  2%  perf-stat.i.major-faults
>     180.03            -3.0%     174.57        perf-stat.i.metric.M/sec
>       3627           -30.8%       2510 ą  2%  perf-stat.i.minor-faults
>       3632           -30.8%       2514 ą  2%  perf-stat.i.page-faults
>       3.88            +3.6%       4.02        perf-stat.overall.cpi
>      21.08            +2.7%      21.65        perf-stat.overall.cycles-between-cache-misses
>       0.06 ą  6%      -0.0        0.05        perf-stat.overall.dTLB-load-miss-rate%
>       0.02 ą 11%      -0.0        0.01 ą  2%  perf-stat.overall.dTLB-store-miss-rate%
>       0.26            -3.5%       0.25        perf-stat.overall.ipc
>  1.182e+08            -4.6%  1.128e+08        perf-stat.ps.branch-instructions
>    5468166            -4.0%    5251939        perf-stat.ps.branch-misses
>  4.135e+08            -2.7%  4.021e+08        perf-stat.ps.cache-misses
>  5.234e+08            -2.6%  5.098e+08        perf-stat.ps.cache-references
>       6859            -2.5%       6685        perf-stat.ps.context-switches
>     470567 ą  6%     -19.9%     377127        perf-stat.ps.dTLB-load-misses
>  8.079e+08            -3.4%  7.805e+08        perf-stat.ps.dTLB-loads
>      90221 ą 11%     -59.8%      36239 ą  2%  perf-stat.ps.dTLB-store-misses
>  5.305e+08            -3.3%  5.128e+08        perf-stat.ps.dTLB-stores
>     129499            -3.0%     125601        perf-stat.ps.iTLB-load-misses
>     143121            -3.1%     138638        perf-stat.ps.iTLB-loads
>  2.246e+09            -3.6%  2.165e+09        perf-stat.ps.instructions
>       4.69 ą  2%      -6.3%       4.39 ą  2%  perf-stat.ps.major-faults
>       3613           -30.8%       2500 ą  2%  perf-stat.ps.minor-faults
>       3617           -30.8%       2504 ą  2%  perf-stat.ps.page-faults
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux