Re: [RFC PATCH V1 4/6] sched/numa: Increase tasks' access history

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/12/2023 7:54 PM, kernel test robot wrote:


hi, Raghu,

hope this third performance report for same one patch-set won't annoy you,
and better, have some value to you.

Not at all. But thanks a lot and am rather more happy to see this
exhaustive results.

Because: It is easy to show see that patchset is increasing readability
of code or maintainance of code etc.,
while I try my best to see regressions are within noise level for some
corner cases and some benchmarks have improved noticeably, there is
always a room to miss something.
Reports like this, helps to boost confidence on patchset.

Also your cumulative (bisection) report also helped to evaluate
importance of each patch too..


we won't send more autonuma-benchmark performance improvement reports for this
patch-set, of course, unless you still hope we do.

BTW, we will still send out performance/function regression reports if any.

as in previous reports, we know that you want to see the performance impact
of whole patch set, so let me give a full summary here:

let me list how we apply your patch set again:

68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned  <-- we reported [1]
167773d1ddb5f sched/numa: Increase tasks' access history   <---- for this report
fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic  <--- we reported [2]
2a806eab1c2e1 sched/numa: Move up the access pid reset logic
2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well

[1] https://lore.kernel.org/all/202309102311.84b42068-oliver.sang@xxxxxxxxx/
[2] https://lore.kernel.org/all/202309121417.53f44ad6-oliver.sang@xxxxxxxxx/

below will only give out the comparison between 2f88c8e802c8b and 68cfe9439a1ba
in a summary way, if you want detail data for more commits, or more comparison
data, please let me know. Thanks!

on
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

2f88c8e802c8b128 68cfe9439a1baa642e05883fa64
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
     271.01           -26.4%     199.49 ±  3%  autonuma-benchmark.numa01.seconds
      76.28           -46.9%      40.49 ±  5%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
       8.11            -0.1%       8.10        autonuma-benchmark.numa02.seconds
       1425           -30.1%     996.02 ±  2%  autonuma-benchmark.time.elapsed_time
       1425           -30.1%     996.02 ±  2%  autonuma-benchmark.time.elapsed_time.max


on
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark

2f88c8e802c8b128 68cfe9439a1baa642e05883fa64
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
     361.53 ±  6%     -10.4%     323.83 ±  3%  autonuma-benchmark.numa01.seconds
     255.31           -60.1%     101.90 ±  2%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
      14.95            -4.6%      14.26        autonuma-benchmark.numa02.seconds
       2530 ±  3%     -30.3%       1763 ±  2%  autonuma-benchmark.time.elapsed_time
       2530 ±  3%     -30.3%       1763 ±  2%  autonuma-benchmark.time.elapsed_time.max



This gives me fair confidence that we are able to get a decent
improvement overall.

below is the auto-generated report part, FYI.

Hello,

kernel test robot noticed a -17.6% improvement of autonuma-benchmark.numa01.seconds on:


commit: 167773d1ddb5ffdd944f851f2cbdd4e65425a358 ("[RFC PATCH V1 4/6] sched/numa: Increase tasks' access history")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
patch link: https://lore.kernel.org/all/cf200aaf594caae68350219fa1f781d64136fa2c.1693287931.git.raghavendra.kt@xxxxxxx/
patch subject: [RFC PATCH V1 4/6] sched/numa: Increase tasks' access history

testcase: autonuma-benchmark
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	iterations: 4x
	test: numa01_THREAD_ALLOC
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -15.4% improvement                           |
| test machine     | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | iterations=4x                                                                                      |
|                  | test=numa01_THREAD_ALLOC                                                                           |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | autonuma-benchmark: autonuma-benchmark.numa01.seconds -14.8% improvement                           |
| test machine     | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | iterations=4x                                                                                      |
|                  | test=_INVERSE_BIND                                                                                 |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | autonuma-benchmark: autonuma-benchmark.numa01_THREAD_ALLOC.seconds -10.7% improvement              |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory     |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | iterations=4x                                                                                      |
|                  | test=numa01_THREAD_ALLOC                                                                           |
+------------------+----------------------------------------------------------------------------------------------------+



Will go through this too.



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230912/202309122114.b9e08a43-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
   fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq")
   167773d1dd ("sched/numa: Increase tasks' access history")

fc769221b23064c0 167773d1ddb5ffdd944f851f2cb
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
     105.67 ±  8%     -20.3%      84.17 ± 10%  perf-c2c.HITM.remote
  1.856e+10 ±  7%     -18.8%  1.508e+10 ±  8%  cpuidle..time
   19025348 ±  7%     -18.6%   15481744 ±  8%  cpuidle..usage
       0.00 ± 28%      +0.0        0.01 ± 10%  mpstat.cpu.all.iowait%
       0.10 ±  2%      -0.0        0.09 ±  4%  mpstat.cpu.all.soft%
       1443 ±  2%     -14.2%       1238 ±  4%  uptime.boot
      26312 ±  5%     -12.8%      22935 ±  5%  uptime.idle
    8774783 ±  7%     -19.0%    7104495 ±  8%  turbostat.C1E
   10147966 ±  7%     -18.4%    8280745 ±  8%  turbostat.C6
  3.225e+08 ±  2%     -14.1%   2.77e+08 ±  4%  turbostat.IRQ
       2.81 ± 24%      +3.5        6.35 ± 24%  turbostat.PKG_%
     638.24            +2.0%     650.74        turbostat.PkgWatt
      57.57           +10.9%      63.85 ±  2%  turbostat.RAMWatt
     271.39 ±  2%     -17.6%     223.53 ±  5%  autonuma-benchmark.numa01.seconds
       1401 ±  2%     -14.6%       1197 ±  4%  autonuma-benchmark.time.elapsed_time
       1401 ±  2%     -14.6%       1197 ±  4%  autonuma-benchmark.time.elapsed_time.max
    1088153 ±  2%     -14.1%     934904 ±  6%  autonuma-benchmark.time.involuntary_context_switches
       3953            -2.6%       3852 ±  2%  autonuma-benchmark.time.system_time
     287110           -14.5%     245511 ±  4%  autonuma-benchmark.time.user_time
      22704 ±  7%     +15.9%      26303 ±  8%  autonuma-benchmark.time.voluntary_context_switches
     191.10 ± 64%     +94.9%     372.49 ±  7%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
       4.09 ± 49%     +85.6%       7.59 ± 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
       1.99 ± 40%     +99.8%       3.97 ± 30%  perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter
      14.18 ±158%     -82.6%       2.47 ± 22%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     189.39 ± 65%     +96.5%     372.20 ±  7%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
       2.18 ± 21%     -33.3%       1.46 ± 41%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
       3.22 ± 32%     -73.0%       0.87 ± 81%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.single_open.do_dentry_open
       4.73 ± 20%     +60.6%       7.59 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
       9.61 ± 30%     -32.8%       6.46 ± 16%  perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      13.57 ± 65%     -60.2%       5.40 ± 24%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
    6040567            -6.2%    5667640        proc-vmstat.numa_hit
      32278 ±  7%     +51.7%      48955 ± 18%  proc-vmstat.numa_huge_pte_updates
    4822780            -7.5%    4459553        proc-vmstat.numa_local
    3187796 ±  9%     +73.2%    5521800 ± 16%  proc-vmstat.numa_pages_migrated
   16792299 ±  7%     +50.8%   25319315 ± 18%  proc-vmstat.numa_pte_updates
    6242814            -8.5%    5711173 ±  2%  proc-vmstat.pgfault
    3187796 ±  9%     +73.2%    5521800 ± 16%  proc-vmstat.pgmigrate_success
     254872 ±  2%     -12.3%     223591 ±  5%  proc-vmstat.pgreuse
       6151 ±  9%     +74.2%      10717 ± 16%  proc-vmstat.thp_migration_success
    4201550           -13.7%    3627350 ±  3%  proc-vmstat.unevictable_pgs_scanned
  1.823e+08 ±  2%     -15.2%  1.547e+08 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.avg
  1.872e+08 ±  2%     -15.3%  1.585e+08 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.max
  1.423e+08 ±  4%     -14.0%  1.224e+08 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.min
    4320209 ±  8%     -18.1%    3537344 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.stddev
       3349 ± 40%     +58.3%       5300 ± 27%  sched_debug.cfs_rq:/.load_avg.max
  1.823e+08 ±  2%     -15.2%  1.547e+08 ±  5%  sched_debug.cfs_rq:/.min_vruntime.avg
  1.872e+08 ±  2%     -15.3%  1.585e+08 ±  5%  sched_debug.cfs_rq:/.min_vruntime.max
  1.423e+08 ±  4%     -14.0%  1.224e+08 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
    4320208 ±  8%     -18.1%    3537344 ±  8%  sched_debug.cfs_rq:/.min_vruntime.stddev
    1852009 ±  3%     -13.2%    1607461 ±  2%  sched_debug.cpu.avg_idle.avg
     751880 ±  2%     -15.1%     638555 ±  4%  sched_debug.cpu.avg_idle.stddev
     725827 ±  2%     -14.1%     623617 ±  4%  sched_debug.cpu.clock.avg
     726857 ±  2%     -14.1%     624498 ±  4%  sched_debug.cpu.clock.max
     724740 ±  2%     -14.1%     622692 ±  4%  sched_debug.cpu.clock.min
     717315 ±  2%     -14.1%     616349 ±  4%  sched_debug.cpu.clock_task.avg
     719648 ±  2%     -14.1%     618089 ±  4%  sched_debug.cpu.clock_task.max
     698681 ±  2%     -14.2%     599424 ±  4%  sched_debug.cpu.clock_task.min
       1839 ±  8%     -18.1%       1506 ±  7%  sched_debug.cpu.clock_task.stddev
      27352            -9.6%      24731 ±  2%  sched_debug.cpu.curr->pid.max
     293258 ±  5%     -16.4%     245303 ±  7%  sched_debug.cpu.max_idle_balance_cost.stddev
     -14.96           +73.6%     -25.98        sched_debug.cpu.nr_uninterruptible.min
       6.27 ±  4%     +18.7%       7.44 ±  6%  sched_debug.cpu.nr_uninterruptible.stddev
     724723 ±  2%     -14.1%     622678 ±  4%  sched_debug.cpu_clk
     723514 ±  2%     -14.1%     621468 ±  4%  sched_debug.ktime
     725604 ±  2%     -14.1%     623550 ±  4%  sched_debug.sched_clk
      29.50 ±  3%     +24.9%      36.83 ±  9%  perf-stat.i.MPKI
  3.592e+08            +5.7%  3.797e+08 ±  2%  perf-stat.i.branch-instructions
    1823514            +3.7%    1891464        perf-stat.i.branch-misses
   28542234 ±  3%     +22.0%   34809605 ± 10%  perf-stat.i.cache-misses
   72486859 ±  3%     +19.6%   86713561 ±  7%  perf-stat.i.cache-references
     224.48            +3.2%     231.63        perf-stat.i.cpu-migrations
     145250 ±  2%     -10.8%     129549 ±  4%  perf-stat.i.cycles-between-cache-misses
       0.08 ±  5%      -0.0        0.07 ± 10%  perf-stat.i.dTLB-load-miss-rate%
     272123 ±  6%     -15.0%     231302 ± 10%  perf-stat.i.dTLB-load-misses
  4.515e+08            +4.7%  4.729e+08 ±  2%  perf-stat.i.dTLB-loads
     995784            +1.9%    1014848        perf-stat.i.dTLB-store-misses
  1.844e+08            +1.5%  1.871e+08        perf-stat.i.dTLB-stores
  1.711e+09            +5.0%  1.797e+09 ±  2%  perf-stat.i.instructions
       3.25            +8.3%       3.52 ±  3%  perf-stat.i.metric.M/sec
       4603            +6.7%       4912 ±  3%  perf-stat.i.minor-faults
     488266 ±  2%     +25.0%     610436 ±  6%  perf-stat.i.node-load-misses
     618022 ±  4%     +13.4%     701130 ±  5%  perf-stat.i.node-loads
       4603            +6.7%       4912 ±  3%  perf-stat.i.page-faults
      39.67 ±  2%     +16.0%      46.04 ±  6%  perf-stat.overall.MPKI
     375.84            -4.9%     357.36 ±  2%  perf-stat.overall.cpi
      24383 ±  3%     -19.0%      19742 ± 12%  perf-stat.overall.cycles-between-cache-misses
       0.06 ±  7%      -0.0        0.05 ± 10%  perf-stat.overall.dTLB-load-miss-rate%
       0.00            +5.2%       0.00 ±  2%  perf-stat.overall.ipc
      41.99 ±  2%      +2.8       44.83 ±  4%  perf-stat.overall.node-load-miss-rate%
  3.355e+08            +6.3%  3.567e+08 ±  2%  perf-stat.ps.branch-instructions
    1758832            +4.4%    1835699        perf-stat.ps.branch-misses
   24888631 ±  3%     +25.6%   31268733 ± 12%  perf-stat.ps.cache-misses
   64007362 ±  3%     +22.5%   78424799 ±  8%  perf-stat.ps.cache-references
     221.69            +3.0%     228.32        perf-stat.ps.cpu-migrations
  4.273e+08            +5.2%  4.495e+08 ±  2%  perf-stat.ps.dTLB-loads
     992569            +1.8%    1010389        perf-stat.ps.dTLB-store-misses
  1.818e+08            +1.6%  1.847e+08        perf-stat.ps.dTLB-stores
  1.613e+09            +5.5%  1.701e+09 ±  2%  perf-stat.ps.instructions
       4331            +7.2%       4644 ±  3%  perf-stat.ps.minor-faults
     477740 ±  2%     +26.3%     603330 ±  7%  perf-stat.ps.node-load-misses
     660610 ±  5%     +12.3%     741896 ±  6%  perf-stat.ps.node-loads
       4331            +7.2%       4644 ±  3%  perf-stat.ps.page-faults
  2.264e+12           -10.0%  2.038e+12 ±  3%  perf-stat.total.instructions
       1.16 ± 20%      -0.6        0.59 ± 47%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
       1.07 ± 20%      -0.5        0.54 ± 47%  perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
       1.96 ± 25%      -0.7        1.27 ± 23%  perf-profile.children.cycles-pp.task_mm_cid_work
       1.16 ± 20%      -0.5        0.67 ± 19%  perf-profile.children.cycles-pp.worker_thread
       1.07 ± 20%      -0.5        0.61 ± 21%  perf-profile.children.cycles-pp.process_one_work
       0.84 ± 44%      -0.4        0.43 ± 25%  perf-profile.children.cycles-pp.evlist__id2evsel
       0.58 ± 34%      -0.2        0.33 ± 21%  perf-profile.children.cycles-pp.do_mprotect_pkey
       0.54 ± 26%      -0.2        0.30 ± 23%  perf-profile.children.cycles-pp.drm_fb_helper_damage_work
       0.54 ± 26%      -0.2        0.30 ± 23%  perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
       0.58 ± 34%      -0.2        0.34 ± 22%  perf-profile.children.cycles-pp.__x64_sys_mprotect
       0.34 ± 23%      -0.2        0.12 ± 64%  perf-profile.children.cycles-pp.drm_gem_vmap_unlocked
       0.34 ± 23%      -0.2        0.12 ± 64%  perf-profile.children.cycles-pp.drm_gem_vmap
       0.34 ± 23%      -0.2        0.12 ± 64%  perf-profile.children.cycles-pp.drm_gem_shmem_object_vmap
       0.34 ± 23%      -0.2        0.12 ± 64%  perf-profile.children.cycles-pp.drm_gem_shmem_vmap_locked
       0.55 ± 32%      -0.2        0.33 ± 18%  perf-profile.children.cycles-pp.__wp_page_copy_user
       0.50 ± 35%      -0.2        0.28 ± 21%  perf-profile.children.cycles-pp.mprotect_fixup
       0.28 ± 25%      -0.2        0.08 ±101%  perf-profile.children.cycles-pp.drm_gem_shmem_get_pages_locked
       0.28 ± 25%      -0.2        0.08 ±101%  perf-profile.children.cycles-pp.drm_gem_get_pages
       0.28 ± 25%      -0.2        0.08 ±102%  perf-profile.children.cycles-pp.shmem_read_folio_gfp
       0.28 ± 25%      -0.2        0.08 ±102%  perf-profile.children.cycles-pp.drm_gem_shmem_get_pages
       0.62 ± 15%      -0.2        0.43 ± 16%  perf-profile.children.cycles-pp.try_to_wake_up
       0.25 ± 19%      -0.2        0.08 ± 84%  perf-profile.children.cycles-pp.drm_client_buffer_vmap
       0.44 ± 19%      -0.2        0.28 ± 31%  perf-profile.children.cycles-pp.filemap_get_entry
       0.39 ± 14%      -0.1        0.26 ± 22%  perf-profile.children.cycles-pp.perf_event_mmap
       0.38 ± 13%      -0.1        0.25 ± 23%  perf-profile.children.cycles-pp.perf_event_mmap_event
       0.22 ± 22%      -0.1        0.11 ± 25%  perf-profile.children.cycles-pp.lru_add_drain_cpu
       0.24 ± 21%      -0.1        0.14 ± 36%  perf-profile.children.cycles-pp.do_open_execat
       0.24 ± 13%      -0.1        0.14 ± 42%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
       0.22 ± 30%      -0.1        0.13 ± 10%  perf-profile.children.cycles-pp.wake_up_q
       0.14 ± 17%      -0.1        0.05 ±101%  perf-profile.children.cycles-pp.open_exec
       0.16 ± 21%      -0.1        0.07 ± 51%  perf-profile.children.cycles-pp.path_init
       0.23 ± 30%      -0.1        0.15 ± 22%  perf-profile.children.cycles-pp.ttwu_do_activate
       0.26 ± 11%      -0.1        0.18 ± 20%  perf-profile.children.cycles-pp.perf_iterate_sb
       0.14 ± 50%      -0.1        0.07 ± 12%  perf-profile.children.cycles-pp.security_inode_getattr
       0.18 ± 27%      -0.1        0.11 ± 20%  perf-profile.children.cycles-pp.select_task_rq
       0.14 ± 21%      -0.1        0.08 ± 29%  perf-profile.children.cycles-pp.get_unmapped_area
       0.10 ± 19%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.expand_downwards
       0.18 ± 16%      -0.1        0.13 ± 26%  perf-profile.children.cycles-pp.__d_alloc
       0.09 ± 15%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.anon_vma_clone
       0.13 ± 36%      -0.1        0.08 ± 19%  perf-profile.children.cycles-pp.file_free_rcu
       0.08 ± 23%      -0.0        0.03 ±101%  perf-profile.children.cycles-pp.__legitimize_mnt
       0.09 ± 15%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.__pipe
       1.92 ± 26%      -0.7        1.24 ± 23%  perf-profile.self.cycles-pp.task_mm_cid_work
       0.82 ± 43%      -0.4        0.42 ± 24%  perf-profile.self.cycles-pp.evlist__id2evsel
       0.42 ± 39%      -0.2        0.22 ± 19%  perf-profile.self.cycles-pp.evsel__read_counter
       0.27 ± 24%      -0.2        0.10 ± 56%  perf-profile.self.cycles-pp.filemap_get_entry
       0.15 ± 48%      -0.1        0.06 ± 11%  perf-profile.self.cycles-pp.ksys_read
       0.10 ± 34%      -0.1        0.03 ±101%  perf-profile.self.cycles-pp.enqueue_task_fair
       0.13 ± 36%      -0.1        0.08 ± 19%  perf-profile.self.cycles-pp.file_free_rcu


***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
   fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq")
   167773d1dd ("sched/numa: Increase tasks' access history")

fc769221b23064c0 167773d1ddb5ffdd944f851f2cb
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
  2.309e+10 ±  6%     -27.8%  1.668e+10 ±  5%  cpuidle..time
   23855797 ±  6%     -27.9%   17210884 ±  5%  cpuidle..usage
       2514           -11.9%       2215        uptime.boot
      27543 ±  5%     -23.1%      21189 ±  5%  uptime.idle
       9.80 ±  5%      -1.8        8.05 ±  6%  mpstat.cpu.all.idle%
       0.01 ±  6%      +0.0        0.01 ± 17%  mpstat.cpu.all.iowait%
       0.08            -0.0        0.07 ±  2%  mpstat.cpu.all.soft%
     845597 ± 12%     -26.1%     624549 ± 19%  numa-numastat.node0.other_node
    2990301 ±  6%     -13.1%    2598273 ±  4%  numa-numastat.node1.local_node
     471614 ± 21%     +45.0%     684016 ± 18%  numa-numastat.node1.other_node
     845597 ± 12%     -26.1%     624549 ± 19%  numa-vmstat.node0.numa_other
       4073 ±106%     -82.5%     711.67 ± 23%  numa-vmstat.node1.nr_mapped
    2989568 ±  6%     -13.1%    2597798 ±  4%  numa-vmstat.node1.numa_local
     471614 ± 21%     +45.0%     684016 ± 18%  numa-vmstat.node1.numa_other
     375.07 ±  4%     -15.4%     317.31 ±  2%  autonuma-benchmark.numa01.seconds
       2462           -12.2%       2162        autonuma-benchmark.time.elapsed_time
       2462           -12.2%       2162        autonuma-benchmark.time.elapsed_time.max
    1354545           -12.9%    1179617        autonuma-benchmark.time.involuntary_context_switches
    3212023            -6.5%    3001966        autonuma-benchmark.time.minor_page_faults
       8377            +2.3%       8572        autonuma-benchmark.time.percent_of_cpu_this_job_got
     199714           -10.4%     179020        autonuma-benchmark.time.user_time
      50675 ±  8%     -19.0%      41038 ± 12%  turbostat.C1
     183835 ±  7%     -17.6%     151526 ±  6%  turbostat.C1E
   23556011 ±  6%     -28.0%   16965247 ±  5%  turbostat.C6
       9.72 ±  5%      -1.7        7.99 ±  6%  turbostat.C6%
       9.54 ±  6%     -18.1%       7.81 ±  6%  turbostat.CPU%c1
  2.404e+08           -12.0%  2.116e+08        turbostat.IRQ
     280.51            +1.2%     283.99        turbostat.PkgWatt
      63.94            +6.7%      68.23        turbostat.RAMWatt
     282375 ±  3%      -9.8%     254565 ±  7%  proc-vmstat.numa_hint_faults
     217705 ±  6%     -12.6%     190234 ±  8%  proc-vmstat.numa_hint_faults_local
    7081835            -7.9%    6524239        proc-vmstat.numa_hit
     107927 ± 10%     +16.6%     125887        proc-vmstat.numa_huge_pte_updates
    5764595            -9.5%    5215673        proc-vmstat.numa_local
    7379523 ± 15%     +25.7%    9272505 ±  4%  proc-vmstat.numa_pages_migrated
   55530575 ± 10%     +16.5%   64669707        proc-vmstat.numa_pte_updates
    8852860            -9.3%    8028738        proc-vmstat.pgfault
    7379523 ± 15%     +25.7%    9272505 ±  4%  proc-vmstat.pgmigrate_success
     393902            -9.6%     356099        proc-vmstat.pgreuse
      14358 ± 15%     +25.8%      18064 ±  5%  proc-vmstat.thp_migration_success
   18273792           -11.5%   16166144        proc-vmstat.unevictable_pgs_scanned
   1.45e+08            -8.7%  1.325e+08        sched_debug.cfs_rq:/.avg_vruntime.max
    3995873           -14.0%    3437625 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.stddev
       0.23 ±  3%      -8.6%       0.21 ±  6%  sched_debug.cfs_rq:/.h_nr_running.stddev
   1.45e+08            -8.7%  1.325e+08        sched_debug.cfs_rq:/.min_vruntime.max
    3995873           -14.0%    3437625 ±  2%  sched_debug.cfs_rq:/.min_vruntime.stddev
       0.53 ± 71%    +195.0%       1.56 ± 37%  sched_debug.cfs_rq:/.removed.load_avg.avg
      25.54 ±  2%     +13.0%      28.87        sched_debug.cfs_rq:/.removed.load_avg.max
       3.40 ± 35%     +85.6%       6.32 ± 17%  sched_debug.cfs_rq:/.removed.load_avg.stddev
       0.16 ± 74%    +275.6%       0.59 ± 39%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
       8.03 ± 31%     +84.9%      14.84        sched_debug.cfs_rq:/.removed.runnable_avg.max
       1.02 ± 44%    +154.3%       2.59 ± 16%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
       0.16 ± 74%    +275.6%       0.59 ± 39%  sched_debug.cfs_rq:/.removed.util_avg.avg
       8.03 ± 31%     +84.9%      14.84        sched_debug.cfs_rq:/.removed.util_avg.max
       1.02 ± 44%    +154.3%       2.59 ± 16%  sched_debug.cfs_rq:/.removed.util_avg.stddev
     146.33 ±  4%     -12.0%     128.80 ±  8%  sched_debug.cfs_rq:/.util_avg.stddev
     361281 ±  5%     -13.6%     312127 ±  3%  sched_debug.cpu.avg_idle.stddev
    1229022            -9.9%    1107544        sched_debug.cpu.clock.avg
    1229436            -9.9%    1107919        sched_debug.cpu.clock.max
    1228579            -9.9%    1107137        sched_debug.cpu.clock.min
     248.12 ±  6%      -8.9%     226.15 ±  2%  sched_debug.cpu.clock.stddev
    1201071            -9.7%    1084858        sched_debug.cpu.clock_task.avg
    1205361            -9.7%    1088445        sched_debug.cpu.clock_task.max
    1190139            -9.7%    1074355        sched_debug.cpu.clock_task.min
     156325 ±  4%     -21.3%     123055 ±  3%  sched_debug.cpu.max_idle_balance_cost.stddev
       0.00 ±  5%      -8.8%       0.00 ±  2%  sched_debug.cpu.next_balance.stddev
       0.23 ±  3%      -6.9%       0.21 ±  4%  sched_debug.cpu.nr_running.stddev
      22855           -11.9%      20146 ±  2%  sched_debug.cpu.nr_switches.avg
       0.00 ± 74%    +301.6%       0.00 ± 41%  sched_debug.cpu.nr_uninterruptible.avg
     -20.99           +50.9%     -31.67        sched_debug.cpu.nr_uninterruptible.min
    1228564            -9.9%    1107124        sched_debug.cpu_clk
    1227997            -9.9%    1106556        sched_debug.ktime
       0.00 ± 70%     +66.1%       0.00        sched_debug.rt_rq:.rt_nr_migratory.avg
       0.02 ± 70%     +66.1%       0.03        sched_debug.rt_rq:.rt_nr_migratory.max
       0.00 ± 70%     +66.1%       0.00        sched_debug.rt_rq:.rt_nr_migratory.stddev
       0.00 ± 70%     +66.1%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
       0.02 ± 70%     +66.1%       0.03        sched_debug.rt_rq:.rt_nr_running.max
       0.00 ± 70%     +66.1%       0.00        sched_debug.rt_rq:.rt_nr_running.stddev
    1229125            -9.9%    1107673        sched_debug.sched_clk
      36.73            +9.2%      40.12        perf-stat.i.MPKI
  1.156e+08            +0.9%  1.166e+08        perf-stat.i.branch-instructions
       1.41            +0.1        1.49        perf-stat.i.branch-miss-rate%
    1755317            +6.4%    1868497        perf-stat.i.branch-misses
      65.90            +2.6       68.53        perf-stat.i.cache-miss-rate%
   13292768           +13.0%   15016556        perf-stat.i.cache-misses
   20180664            +9.2%   22041180        perf-stat.i.cache-references
       1620            -2.0%       1588        perf-stat.i.context-switches
     492.61            +2.2%     503.60        perf-stat.i.cpi
  2.624e+11            +2.3%  2.685e+11        perf-stat.i.cpu-cycles
      20261            -9.6%      18315        perf-stat.i.cycles-between-cache-misses
       0.08 ±  5%      -0.0        0.07        perf-stat.i.dTLB-load-miss-rate%
     114641 ±  5%      -6.6%     107104        perf-stat.i.dTLB-load-misses
       0.24            +0.0        0.25        perf-stat.i.dTLB-store-miss-rate%
     202887            +3.4%     209829        perf-stat.i.dTLB-store-misses
     479259 ±  2%      -9.8%     432243 ±  6%  perf-stat.i.iTLB-load-misses
     272948 ±  5%     -16.4%     228065 ±  3%  perf-stat.i.iTLB-loads
  5.888e+08            +0.8%  5.938e+08        perf-stat.i.instructions
       1349           +15.8%       1561 ±  2%  perf-stat.i.instructions-per-iTLB-miss
       2.73            +2.3%       2.80        perf-stat.i.metric.GHz
       3510            +2.9%       3612        perf-stat.i.minor-faults
     302696 ±  4%      +8.0%     327055        perf-stat.i.node-load-misses
    5025469 ±  3%     +16.0%    5831348 ±  2%  perf-stat.i.node-store-misses
    6419781           +11.7%    7171575        perf-stat.i.node-stores
       3510            +2.9%       3613        perf-stat.i.page-faults
      34.43            +8.1%      37.21        perf-stat.overall.MPKI
       1.51            +0.1        1.59        perf-stat.overall.branch-miss-rate%
      66.31            +2.2       68.53        perf-stat.overall.cache-miss-rate%
      19793            -9.3%      17950        perf-stat.overall.cycles-between-cache-misses
       0.07 ±  5%      -0.0        0.07        perf-stat.overall.dTLB-load-miss-rate%
       0.23            +0.0        0.24        perf-stat.overall.dTLB-store-miss-rate%
       1227 ±  2%     +12.1%       1376 ±  6%  perf-stat.overall.instructions-per-iTLB-miss
    1729818            +6.4%    1840962        perf-stat.ps.branch-misses
   13346402           +12.6%   15031113        perf-stat.ps.cache-misses
   20127330            +9.0%   21934543        perf-stat.ps.cache-references
       1624            -2.1%       1590        perf-stat.ps.context-switches
  2.641e+11            +2.1%  2.698e+11        perf-stat.ps.cpu-cycles
     113287 ±  5%      -6.8%     105635        perf-stat.ps.dTLB-load-misses
     203569            +3.2%     210036        perf-stat.ps.dTLB-store-misses
     476376 ±  2%      -9.8%     429901 ±  6%  perf-stat.ps.iTLB-load-misses
     259293 ±  5%     -16.3%     217088 ±  3%  perf-stat.ps.iTLB-loads
       3465            +3.1%       3571        perf-stat.ps.minor-faults
     299695 ±  4%      +8.3%     324433        perf-stat.ps.node-load-misses
    5044747 ±  3%     +15.7%    5834322 ±  2%  perf-stat.ps.node-store-misses
    6459846           +11.3%    7189821        perf-stat.ps.node-stores
       3465            +3.1%       3571        perf-stat.ps.page-faults
   1.44e+12           -11.4%  1.275e+12        perf-stat.total.instructions
       0.47 ± 58%    +593.5%       3.27 ± 81%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
       0.37 ±124%    +352.3%       1.67 ± 58%  perf-sched.sch_delay.avg.ms.__cond_resched.copy_strings.isra.0.do_execveat_common
       0.96 ± 74%     -99.0%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
       2.01 ± 79%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso
       1.35 ± 72%     -69.8%       0.41 ± 80%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm
       0.17 ± 18%     -26.5%       0.13 ±  5%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
       0.26 ± 16%     -39.0%       0.16 ±  7%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
       2.57 ± 65%   +1027.2%      28.92 ±120%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
       0.38 ±119%    +669.3%       2.92 ± 19%  perf-sched.sch_delay.max.ms.__cond_resched.copy_strings.isra.0.do_execveat_common
       0.51 ±141%    +234.9%       1.71 ± 69%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.elf_map.load_elf_binary
       1.63 ± 74%     -98.9%       0.02 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part
       3.38 ± 12%     -55.7%       1.50 ± 78%  perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.search_binary_handler.exec_binprm
       2.37 ± 68%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.__install_special_mapping.map_vdso
       2.05 ± 62%     -68.1%       0.65 ± 93%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm
       9.09 ±119%     -96.0%       0.36 ± 42%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
       3.86 ± 40%     -50.1%       1.93 ± 30%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
       2.77 ± 78%     -88.0%       0.33 ± 29%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
       2.48 ± 60%     -86.1%       0.34 ±  7%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      85.92 ± 73%     +97.7%     169.86 ± 31%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      95.98 ±  6%      -9.5%      86.82 ±  4%  perf-sched.total_wait_and_delay.average.ms
      95.30 ±  6%      -9.6%      86.19 ±  4%  perf-sched.total_wait_time.average.ms
     725.88 ± 28%     -73.5%     192.63 ±141%  perf-sched.wait_and_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
       2.22 ± 42%     -76.2%       0.53 ±141%  perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
       4.02 ±  5%     -31.9%       2.74 ± 19%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
     653.51 ±  9%     -13.3%     566.43 ±  7%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     775.33 ±  4%     -19.8%     621.67 ± 13%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      88.33 ± 14%     -16.6%      73.67 ± 11%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
       6.28 ± 19%     -73.5%       1.67 ±141%  perf-sched.wait_and_delay.max.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
       1286 ±  3%     -65.6%     442.66 ± 91%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
     222.90 ± 16%     +53.8%     342.84 ± 30%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
       0.91 ± 70%   +7745.7%      71.06 ±129%  perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter
      21.65 ± 34%     +42.0%      30.75 ± 12%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
       2.67 ± 26%     -96.6%       0.09 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup
     725.14 ± 28%     -73.5%     192.24 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
       2.87 ± 28%     -96.7%       0.09 ± 77%  perf-sched.wait_time.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
       2.10 ± 73%   +4020.9%      86.55 ±135%  perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.open_last_lookups.path_openat
       1.96 ± 73%     -94.8%       0.10 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
       3.24 ± 21%     -65.0%       1.13 ± 69%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region
     338.18 ±140%    -100.0%       0.07 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process
      21.80 ±122%     -94.7%       1.16 ±130%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.do_vmi_align_munmap
       4.29 ± 11%     -66.2%       1.45 ±118%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup
       0.94 ±126%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write
       3.69 ± 29%     -72.9%       1.00 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group
       0.04 ±141%   +6192.3%       2.73 ± 63%  perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
      32.86 ±128%     -95.2%       1.57 ± 12%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
       3.96 ±  5%     -33.0%       2.66 ± 19%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
       7.38 ± 57%     -89.8%       0.75 ± 88%  perf-sched.wait_time.avg.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread
     643.25 ±  9%     -12.8%     560.82 ±  8%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
       2.22 ± 74%  +15121.1%     338.52 ±138%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_trace.vmstat_start.seq_read_iter
       4.97 ± 39%     -98.2%       0.09 ±141%  perf-sched.wait_time.max.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup
       3.98           -96.1%       0.16 ± 94%  perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
       4.28 ±  3%     -66.5%       1.44 ±126%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
       3.95 ± 14%    +109.8%       8.28 ± 45%  perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
       2.04 ± 74%     -95.0%       0.10 ±141%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
     340.63 ±140%    -100.0%       0.12 ±141%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.prepare_creds.copy_creds.copy_process
       4.74 ± 22%     -68.4%       1.50 ±117%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup
       1.30 ±141%    +205.8%       3.99        perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
       1.42 ±131%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.pipe_write.vfs_write.ksys_write
     337.62 ±140%     -99.6%       1.33 ±141%  perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
       4.91 ± 27%   +4797.8%     240.69 ± 69%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
       4.29 ±  7%     -76.7%       1.00 ±141%  perf-sched.wait_time.max.ms.__cond_resched.task_work_run.do_exit.do_group_exit.__x64_sys_exit_group
       0.05 ±141%   +5358.6%       2.77 ± 61%  perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
     338.90 ±138%     -98.8%       3.95        perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
       1284 ±  3%     -68.7%     401.56 ±106%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
       7.38 ± 57%     -89.8%       0.75 ± 88%  perf-sched.wait_time.max.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread
      20.80 ± 72%     -20.8        0.00        perf-profile.calltrace.cycles-pp.__cmd_record
      20.80 ± 72%     -20.8        0.00        perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record
      20.78 ± 72%     -20.8        0.00        perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record
      20.74 ± 72%     -20.7        0.00        perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
      20.43 ± 72%     -20.4        0.00        perf-profile.calltrace.cycles-pp.process_simple.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
      20.03 ± 72%     -20.0        0.00        perf-profile.calltrace.cycles-pp.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events.record__finish_output
      19.84 ± 72%     -19.8        0.00        perf-profile.calltrace.cycles-pp.queue_event.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events
       0.77 ± 26%      +0.2        1.00 ± 13%  perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
       0.73 ± 26%      +0.3        1.00 ± 21%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
       0.74 ± 18%      +0.3        1.07 ± 19%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
       0.73 ± 18%      +0.3        1.07 ± 19%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
       0.78 ± 36%      +0.3        1.11 ± 19%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64
       0.44 ± 73%      +0.3        0.77 ± 14%  perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
       0.78 ± 36%      +0.3        1.12 ± 19%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64
       0.76 ± 17%      +0.3        1.10 ± 19%  perf-profile.calltrace.cycles-pp.write
       0.81 ± 34%      +0.4        1.16 ± 16%  perf-profile.calltrace.cycles-pp.__fxstat64
       0.96 ± 33%      +0.4        1.35 ± 15%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
       0.96 ± 33%      +0.4        1.35 ± 15%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
       0.18 ±141%      +0.4        0.60 ± 13%  perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_openat2
       1.00 ± 28%      +0.4        1.43 ±  6%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel
       0.22 ±141%      +0.4        0.65 ± 18%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
       0.47 ± 76%      +0.5        0.93 ± 10%  perf-profile.calltrace.cycles-pp.mm_init.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64
       0.42 ± 73%      +0.5        0.90 ± 23%  perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
       1.14 ± 29%      +0.5        1.62 ±  7%  perf-profile.calltrace.cycles-pp.__close_nocancel
       0.41 ± 73%      +0.5        0.90 ± 23%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
       1.10 ± 28%      +0.5        1.59 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close_nocancel
       1.10 ± 28%      +0.5        1.59 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close_nocancel
       1.13 ± 19%      +0.5        1.66 ± 17%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
       0.58 ± 77%      +0.5        1.12 ±  8%  perf-profile.calltrace.cycles-pp.alloc_bprm.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
       0.22 ±141%      +0.5        0.77 ± 18%  perf-profile.calltrace.cycles-pp.lookup_fast.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
       0.27 ±141%      +0.5        0.82 ± 20%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64
       0.00            +0.6        0.56 ±  9%  perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open
       0.22 ±141%      +0.6        0.85 ± 18%  perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
       1.03 ± 71%      +5.3        6.34 ± 64%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
       1.04 ± 71%      +5.3        6.37 ± 64%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
       1.07 ± 71%      +5.4        6.47 ± 63%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
       1.07 ± 71%      +5.4        6.47 ± 63%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
       1.07 ± 71%      +5.4        6.47 ± 63%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
       1.00 ± 71%      +5.5        6.50 ± 57%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
       1.03 ± 71%      +5.6        6.61 ± 58%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
       1.07 ± 71%      +5.7        6.74 ± 57%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
       1.38 ± 78%      +6.2        7.53 ± 41%  perf-profile.calltrace.cycles-pp.copy_page.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
       1.44 ± 80%      +6.2        7.63 ± 41%  perf-profile.calltrace.cycles-pp.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages
       1.44 ± 80%      +6.2        7.67 ± 41%  perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page
       1.44 ± 80%      +6.2        7.67 ± 41%  perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
       1.52 ± 78%      +6.5        8.07 ± 41%  perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault
       1.52 ± 78%      +6.5        8.07 ± 41%  perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault
       1.52 ± 78%      +6.6        8.08 ± 41%  perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
       1.53 ± 78%      +6.6        8.14 ± 41%  perf-profile.calltrace.cycles-pp.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
       5.22 ± 49%      +7.3       12.52 ± 23%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
       5.49 ± 48%      +7.5       12.98 ± 22%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
       6.00 ± 47%      +7.6       13.57 ± 20%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
       5.97 ± 48%      +7.6       13.55 ± 20%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
       6.99 ± 45%      +7.8       14.80 ± 19%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
      20.83 ± 73%     -20.8        0.00        perf-profile.children.cycles-pp.queue_event
      20.80 ± 72%     -20.8        0.00        perf-profile.children.cycles-pp.record__finish_output
      20.78 ± 72%     -20.8        0.00        perf-profile.children.cycles-pp.perf_session__process_events
      20.75 ± 72%     -20.8        0.00        perf-profile.children.cycles-pp.reader__read_event
      20.43 ± 72%     -20.4        0.00        perf-profile.children.cycles-pp.process_simple
      20.03 ± 72%     -20.0        0.00        perf-profile.children.cycles-pp.ordered_events__queue
       0.37 ± 14%      -0.1        0.26 ± 15%  perf-profile.children.cycles-pp.rebalance_domains
       0.11 ±  8%      -0.1        0.06 ± 75%  perf-profile.children.cycles-pp.wake_up_q
       0.13 ±  7%      +0.0        0.15 ± 13%  perf-profile.children.cycles-pp.get_unmapped_area
       0.05            +0.0        0.08 ± 22%  perf-profile.children.cycles-pp.complete_signal
       0.07 ± 23%      +0.0        0.10 ± 19%  perf-profile.children.cycles-pp.lru_add_fn
       0.08 ± 24%      +0.0        0.12 ± 10%  perf-profile.children.cycles-pp.__do_sys_brk
       0.08 ± 11%      +0.0        0.13 ± 19%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
       0.08 ± 12%      +0.0        0.12 ± 27%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
       0.02 ±141%      +0.0        0.06 ± 19%  perf-profile.children.cycles-pp.workingset_age_nonresident
       0.02 ±141%      +0.0        0.06 ± 19%  perf-profile.children.cycles-pp.workingset_activation
       0.04 ± 71%      +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.page_add_file_rmap
       0.09 ± 18%      +0.1        0.14 ± 23%  perf-profile.children.cycles-pp.terminate_walk
       0.08 ± 12%      +0.1        0.13 ± 19%  perf-profile.children.cycles-pp.__send_signal_locked
       0.00            +0.1        0.06 ±  8%  perf-profile.children.cycles-pp.proc_pident_lookup
       0.11 ± 15%      +0.1        0.17 ± 15%  perf-profile.children.cycles-pp.exit_notify
       0.15 ± 31%      +0.1        0.21 ± 15%  perf-profile.children.cycles-pp.try_charge_memcg
       0.04 ± 71%      +0.1        0.10 ± 27%  perf-profile.children.cycles-pp.__mod_lruvec_state
       0.04 ± 73%      +0.1        0.10 ± 24%  perf-profile.children.cycles-pp.__mod_node_page_state
       0.11 ± 25%      +0.1        0.17 ± 22%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
       0.08 ± 12%      +0.1        0.14 ± 26%  perf-profile.children.cycles-pp.get_slabinfo
       0.02 ±141%      +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.fput
       0.12 ±  6%      +0.1        0.18 ± 20%  perf-profile.children.cycles-pp.xas_find
       0.08 ± 17%      +0.1        0.15 ± 39%  perf-profile.children.cycles-pp.task_numa_fault
       0.07 ± 44%      +0.1        0.14 ± 18%  perf-profile.children.cycles-pp.___slab_alloc
       0.02 ±141%      +0.1        0.09 ± 35%  perf-profile.children.cycles-pp.copy_creds
       0.08 ± 12%      +0.1        0.15 ± 18%  perf-profile.children.cycles-pp._exit
       0.07 ± 78%      +0.1        0.15 ± 27%  perf-profile.children.cycles-pp.file_free_rcu
       0.02 ±141%      +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.do_task_dead
       0.19 ± 22%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.dequeue_entity
       0.18 ± 29%      +0.1        0.26 ± 16%  perf-profile.children.cycles-pp.lru_add_drain
       0.03 ± 70%      +0.1        0.11 ± 25%  perf-profile.children.cycles-pp.node_read_numastat
       0.07 ± 25%      +0.1        0.15 ± 51%  perf-profile.children.cycles-pp.__kernel_read
       0.20 ±  4%      +0.1        0.28 ± 24%  perf-profile.children.cycles-pp.__do_fault
       0.23 ± 17%      +0.1        0.31 ±  9%  perf-profile.children.cycles-pp.native_irq_return_iret
       0.11 ± 27%      +0.1        0.20 ± 17%  perf-profile.children.cycles-pp.__pte_alloc
       0.06 ± 86%      +0.1        0.14 ± 44%  perf-profile.children.cycles-pp.cgroup_rstat_flush
       0.06 ± 86%      +0.1        0.14 ± 44%  perf-profile.children.cycles-pp.cgroup_rstat_flush_locked
       0.06 ± 86%      +0.1        0.14 ± 44%  perf-profile.children.cycles-pp.do_flush_stats
       0.06 ± 86%      +0.1        0.14 ± 44%  perf-profile.children.cycles-pp.flush_memcg_stats_dwork
       0.12 ± 28%      +0.1        0.20 ± 18%  perf-profile.children.cycles-pp.d_path
       0.08 ± 36%      +0.1        0.16 ± 17%  perf-profile.children.cycles-pp.lookup_open
       0.11 ±  7%      +0.1        0.20 ± 33%  perf-profile.children.cycles-pp.copy_pte_range
       0.13 ± 16%      +0.1        0.22 ± 18%  perf-profile.children.cycles-pp.dev_attr_show
       0.04 ± 73%      +0.1        0.13 ± 49%  perf-profile.children.cycles-pp.task_numa_migrate
       0.19 ± 17%      +0.1        0.28 ±  7%  perf-profile.children.cycles-pp.__count_memcg_events
       0.15 ± 17%      +0.1        0.24 ± 10%  perf-profile.children.cycles-pp.__pmd_alloc
       0.00            +0.1        0.09 ± 31%  perf-profile.children.cycles-pp.remove_vma
       0.13 ± 16%      +0.1        0.22 ± 22%  perf-profile.children.cycles-pp.sysfs_kf_seq_show
       0.12 ± 26%      +0.1        0.21 ± 26%  perf-profile.children.cycles-pp.__do_set_cpus_allowed
       0.08 ± 78%      +0.1        0.18 ± 20%  perf-profile.children.cycles-pp.free_unref_page
       0.02 ±141%      +0.1        0.11 ± 32%  perf-profile.children.cycles-pp.nd_jump_root
       0.05 ± 74%      +0.1        0.14 ± 23%  perf-profile.children.cycles-pp._find_next_bit
       0.12 ± 22%      +0.1        0.21 ± 21%  perf-profile.children.cycles-pp.clock_gettime
       0.02 ±141%      +0.1        0.11 ± 29%  perf-profile.children.cycles-pp.free_percpu
       0.00            +0.1        0.10 ± 25%  perf-profile.children.cycles-pp.lockref_get
       0.25 ± 40%      +0.1        0.35 ± 24%  perf-profile.children.cycles-pp.shift_arg_pages
       0.26 ± 29%      +0.1        0.36 ± 14%  perf-profile.children.cycles-pp.rmqueue
       0.13 ± 35%      +0.1        0.23 ± 24%  perf-profile.children.cycles-pp.single_open
       0.05 ± 78%      +0.1        0.15 ± 29%  perf-profile.children.cycles-pp.vma_expand
       0.09 ±  5%      +0.1        0.21 ± 41%  perf-profile.children.cycles-pp.prepare_task_switch
       0.08 ± 12%      +0.1        0.19 ± 37%  perf-profile.children.cycles-pp.copy_page_to_iter
       0.22 ± 40%      +0.1        0.34 ± 33%  perf-profile.children.cycles-pp.mas_wr_node_store
       0.16 ± 41%      +0.1        0.27 ± 13%  perf-profile.children.cycles-pp.__set_cpus_allowed_ptr_locked
       0.16 ± 10%      +0.1        0.28 ± 26%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
       0.11 ± 28%      +0.1        0.23 ± 27%  perf-profile.children.cycles-pp.single_release
       0.00            +0.1        0.12 ± 37%  perf-profile.children.cycles-pp.find_busiest_queue
       0.23 ± 28%      +0.1        0.35 ± 23%  perf-profile.children.cycles-pp.pte_alloc_one
       0.23 ± 32%      +0.1        0.35 ± 16%  perf-profile.children.cycles-pp.strncpy_from_user
       0.20 ± 55%      +0.1        0.33 ± 25%  perf-profile.children.cycles-pp.gather_stats
       0.16 ± 30%      +0.1        0.30 ± 12%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
       0.29 ± 31%      +0.1        0.43 ± 14%  perf-profile.children.cycles-pp.setup_arg_pages
       0.13 ± 18%      +0.1        0.27 ± 28%  perf-profile.children.cycles-pp.aa_file_perm
       0.03 ± 70%      +0.1        0.18 ± 73%  perf-profile.children.cycles-pp.set_pmd_migration_entry
       0.09 ±103%      +0.1        0.23 ± 39%  perf-profile.children.cycles-pp.__wait_for_common
       0.19 ± 16%      +0.1        0.33 ± 27%  perf-profile.children.cycles-pp.obj_cgroup_charge
       0.03 ± 70%      +0.1        0.18 ± 74%  perf-profile.children.cycles-pp.try_to_migrate_one
       0.14 ± 41%      +0.2        0.29 ± 34%  perf-profile.children.cycles-pp.select_task_rq
       0.28 ± 35%      +0.2        0.44 ± 28%  perf-profile.children.cycles-pp.vm_area_alloc
       0.04 ± 71%      +0.2        0.20 ± 73%  perf-profile.children.cycles-pp.try_to_migrate
       0.04 ± 71%      +0.2        0.22 ± 70%  perf-profile.children.cycles-pp.rmap_walk_anon
       0.37 ± 28%      +0.2        0.55 ± 23%  perf-profile.children.cycles-pp.pick_next_task_fair
       0.04 ± 71%      +0.2        0.22 ± 57%  perf-profile.children.cycles-pp.migrate_folio_unmap
       0.11 ± 51%      +0.2        0.31 ± 30%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
       0.30 ± 30%      +0.2        0.50 ± 16%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
       0.30 ± 19%      +0.2        0.50 ± 23%  perf-profile.children.cycles-pp.__perf_sw_event
       0.21 ± 30%      +0.2        0.41 ± 19%  perf-profile.children.cycles-pp.apparmor_file_permission
       0.25 ± 29%      +0.2        0.45 ± 15%  perf-profile.children.cycles-pp.security_file_permission
       0.13 ± 55%      +0.2        0.34 ± 24%  perf-profile.children.cycles-pp.smp_call_function_many_cond
       0.31 ± 34%      +0.2        0.52 ± 30%  perf-profile.children.cycles-pp.pipe_read
       0.32 ± 16%      +0.2        0.55 ±  8%  perf-profile.children.cycles-pp.getname_flags
       0.33 ± 11%      +0.2        0.55 ± 21%  perf-profile.children.cycles-pp.___perf_sw_event
       0.17 ± 44%      +0.2        0.40 ± 38%  perf-profile.children.cycles-pp.newidle_balance
       0.38 ± 38%      +0.2        0.60 ± 12%  perf-profile.children.cycles-pp.__percpu_counter_init
       0.38 ± 37%      +0.2        0.61 ± 18%  perf-profile.children.cycles-pp.readlink
       0.27 ± 40%      +0.2        0.51 ± 21%  perf-profile.children.cycles-pp.mod_objcg_state
       0.76 ± 17%      +0.3        1.10 ± 19%  perf-profile.children.cycles-pp.write
       0.48 ± 42%      +0.4        0.83 ± 13%  perf-profile.children.cycles-pp.pid_revalidate
       0.61 ± 34%      +0.4        0.98 ± 17%  perf-profile.children.cycles-pp.__d_lookup_rcu
       0.73 ± 35%      +0.4        1.12 ±  8%  perf-profile.children.cycles-pp.alloc_bprm
       0.59 ± 42%      +0.4        0.98 ± 11%  perf-profile.children.cycles-pp.pcpu_alloc
       0.77 ± 31%      +0.4        1.21 ±  4%  perf-profile.children.cycles-pp.mm_init
       0.92 ± 31%      +0.5        1.38 ± 12%  perf-profile.children.cycles-pp.__fxstat64
       0.74 ± 32%      +0.5        1.27 ± 20%  perf-profile.children.cycles-pp.open_last_lookups
       1.37 ± 29%      +0.6        1.94 ± 19%  perf-profile.children.cycles-pp.kmem_cache_alloc
       1.35 ± 38%      +0.7        2.09 ± 15%  perf-profile.children.cycles-pp.lookup_fast
       1.13 ± 59%      +5.3        6.47 ± 63%  perf-profile.children.cycles-pp.start_secondary
       1.06 ± 60%      +5.4        6.50 ± 57%  perf-profile.children.cycles-pp.intel_idle
       1.09 ± 59%      +5.5        6.62 ± 58%  perf-profile.children.cycles-pp.cpuidle_enter
       1.09 ± 59%      +5.5        6.62 ± 58%  perf-profile.children.cycles-pp.cpuidle_enter_state
       1.10 ± 59%      +5.5        6.65 ± 58%  perf-profile.children.cycles-pp.cpuidle_idle_call
       1.13 ± 59%      +5.6        6.74 ± 57%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
       1.13 ± 59%      +5.6        6.74 ± 57%  perf-profile.children.cycles-pp.cpu_startup_entry
       1.13 ± 59%      +5.6        6.74 ± 57%  perf-profile.children.cycles-pp.do_idle
       1.51 ± 69%      +6.1        7.65 ± 41%  perf-profile.children.cycles-pp.folio_copy
       1.52 ± 69%      +6.2        7.68 ± 41%  perf-profile.children.cycles-pp.move_to_new_folio
       1.52 ± 69%      +6.2        7.68 ± 41%  perf-profile.children.cycles-pp.migrate_folio_extra
       1.74 ± 63%      +6.2        7.96 ± 39%  perf-profile.children.cycles-pp.copy_page
       1.61 ± 68%      +6.5        8.08 ± 41%  perf-profile.children.cycles-pp.migrate_pages_batch
       1.61 ± 68%      +6.5        8.09 ± 41%  perf-profile.children.cycles-pp.migrate_pages
       1.61 ± 68%      +6.5        8.10 ± 41%  perf-profile.children.cycles-pp.migrate_misplaced_page
       1.62 ± 67%      +6.5        8.14 ± 41%  perf-profile.children.cycles-pp.do_huge_pmd_numa_page
       7.23 ± 41%      +7.5       14.76 ± 19%  perf-profile.children.cycles-pp.__handle_mm_fault
       8.24 ± 38%      +7.6       15.86 ± 17%  perf-profile.children.cycles-pp.exc_page_fault
       8.20 ± 38%      +7.6       15.84 ± 17%  perf-profile.children.cycles-pp.do_user_addr_fault
       9.84 ± 35%      +7.7       17.51 ± 15%  perf-profile.children.cycles-pp.asm_exc_page_fault
       7.71 ± 40%      +7.7       15.41 ± 18%  perf-profile.children.cycles-pp.handle_mm_fault
      20.00 ± 72%     -20.0        0.00        perf-profile.self.cycles-pp.queue_event
       0.18 ± 22%      -0.1        0.10 ± 24%  perf-profile.self.cycles-pp.__d_lookup
       0.07 ± 25%      +0.0        0.10 ±  9%  perf-profile.self.cycles-pp.__perf_read_group_add
       0.08 ± 16%      +0.0        0.12 ± 26%  perf-profile.self.cycles-pp.check_heap_object
       0.05 ±  8%      +0.0        0.09 ± 30%  perf-profile.self.cycles-pp.__memcg_kmem_charge_page
       0.02 ±141%      +0.0        0.06 ± 13%  perf-profile.self.cycles-pp.try_to_wake_up
       0.08 ± 31%      +0.1        0.14 ± 30%  perf-profile.self.cycles-pp.task_dump_owner
       0.05 ± 74%      +0.1        0.10 ± 24%  perf-profile.self.cycles-pp.rmqueue
       0.14 ± 26%      +0.1        0.20 ±  6%  perf-profile.self.cycles-pp.init_file
       0.05 ± 78%      +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.enqueue_task_fair
       0.05 ± 78%      +0.1        0.10 ± 27%  perf-profile.self.cycles-pp.___slab_alloc
       0.02 ±141%      +0.1        0.08 ± 24%  perf-profile.self.cycles-pp.pick_link
       0.04 ± 73%      +0.1        0.10 ± 24%  perf-profile.self.cycles-pp.__mod_node_page_state
       0.07 ± 17%      +0.1        0.14 ± 26%  perf-profile.self.cycles-pp.get_slabinfo
       0.00            +0.1        0.07 ± 18%  perf-profile.self.cycles-pp.select_task_rq
       0.07 ± 78%      +0.1        0.15 ± 27%  perf-profile.self.cycles-pp.file_free_rcu
       0.09 ± 44%      +0.1        0.16 ± 15%  perf-profile.self.cycles-pp.apparmor_file_permission
       0.08 ± 27%      +0.1        0.15 ± 35%  perf-profile.self.cycles-pp.malloc
       0.02 ±141%      +0.1        0.10 ± 29%  perf-profile.self.cycles-pp.memcg_account_kmem
       0.23 ± 17%      +0.1        0.31 ±  9%  perf-profile.self.cycles-pp.native_irq_return_iret
       0.13 ± 32%      +0.1        0.21 ± 32%  perf-profile.self.cycles-pp.obj_cgroup_charge
       0.10 ± 43%      +0.1        0.19 ± 11%  perf-profile.self.cycles-pp.perf_read
       0.14 ± 12%      +0.1        0.23 ± 25%  perf-profile.self.cycles-pp.cgroup_rstat_updated
       0.13 ± 43%      +0.1        0.23 ± 27%  perf-profile.self.cycles-pp.mod_objcg_state
       0.00            +0.1        0.10 ± 25%  perf-profile.self.cycles-pp.lockref_get
       0.07 ± 78%      +0.1        0.18 ± 34%  perf-profile.self.cycles-pp.update_rq_clock_task
       0.00            +0.1        0.10 ± 27%  perf-profile.self.cycles-pp.find_busiest_queue
       0.09 ± 59%      +0.1        0.21 ± 29%  perf-profile.self.cycles-pp.smp_call_function_many_cond
       0.15 ± 31%      +0.1        0.27 ± 16%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
       0.19 ± 39%      +0.1        0.32 ± 19%  perf-profile.self.cycles-pp.zap_pte_range
       0.13 ± 18%      +0.1        0.26 ± 23%  perf-profile.self.cycles-pp.aa_file_perm
       0.19 ± 50%      +0.1        0.32 ± 24%  perf-profile.self.cycles-pp.gather_stats
       0.24 ± 16%      +0.2        0.40 ± 17%  perf-profile.self.cycles-pp.___perf_sw_event
       0.25 ± 31%      +0.2        0.41 ± 16%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
       0.08 ± 71%      +0.2        0.25 ± 24%  perf-profile.self.cycles-pp.pcpu_alloc
       0.16 ± 38%      +0.2        0.34 ± 21%  perf-profile.self.cycles-pp.filemap_map_pages
       0.32 ± 41%      +0.2        0.54 ± 17%  perf-profile.self.cycles-pp.pid_revalidate
       0.47 ± 19%      +0.3        0.73 ± 21%  perf-profile.self.cycles-pp.kmem_cache_alloc
       0.60 ± 34%      +0.4        0.96 ± 18%  perf-profile.self.cycles-pp.__d_lookup_rcu
       1.06 ± 60%      +5.4        6.50 ± 57%  perf-profile.self.cycles-pp.intel_idle
       1.74 ± 63%      +6.2        7.92 ± 39%  perf-profile.self.cycles-pp.copy_page



***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp3/_INVERSE_BIND/autonuma-benchmark

commit:
   fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq")
   167773d1dd ("sched/numa: Increase tasks' access history")

fc769221b23064c0 167773d1ddb5ffdd944f851f2cb
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
       0.01 ± 20%      +0.0        0.01 ± 15%  mpstat.cpu.all.iowait%
      25370 ±  3%     -13.5%      21946 ±  6%  uptime.idle
  2.098e+10 ±  4%     -15.8%  1.767e+10 ±  7%  cpuidle..time
   21696014 ±  4%     -15.8%   18274389 ±  7%  cpuidle..usage
    3567832 ±  2%     -12.9%    3106532 ±  5%  numa-numastat.node1.local_node
    4472555 ±  2%     -10.8%    3989658 ±  6%  numa-numastat.node1.numa_hit
   21420616 ±  4%     -15.9%   18019892 ±  7%  turbostat.C6
      62.12            +3.8%      64.46        turbostat.RAMWatt
     185236 ±  6%     -17.4%     152981 ± 15%  numa-meminfo.node1.Active
     184892 ±  6%     -17.5%     152523 ± 15%  numa-meminfo.node1.Active(anon)
     190876 ±  6%     -17.4%     157580 ± 15%  numa-meminfo.node1.Shmem
     373.94 ±  4%     -14.8%     318.67 ±  6%  autonuma-benchmark.numa01.seconds
       3066 ±  2%      -7.6%       2833 ±  3%  autonuma-benchmark.time.elapsed_time
       3066 ±  2%      -7.6%       2833 ±  3%  autonuma-benchmark.time.elapsed_time.max
    1770652 ±  3%      -7.7%    1634112 ±  3%  autonuma-benchmark.time.involuntary_context_switches
     258701 ±  2%      -6.9%     240826 ±  3%  autonuma-benchmark.time.user_time
      46235 ±  6%     -17.5%      38150 ± 15%  numa-vmstat.node1.nr_active_anon
      47723 ±  6%     -17.4%      39411 ± 15%  numa-vmstat.node1.nr_shmem
      46235 ±  6%     -17.5%      38150 ± 15%  numa-vmstat.node1.nr_zone_active_anon
    4471422 ±  2%     -10.8%    3989129 ±  6%  numa-vmstat.node1.numa_hit
    3566699 ±  2%     -12.9%    3106004 ±  5%  numa-vmstat.node1.numa_local
       2.37 ± 23%     +45.3%       3.44 ± 16%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
       2.26 ± 28%     +45.0%       3.28 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.stddev
     203.53 ±  4%     -12.8%     177.48 ±  3%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
     128836 ±  7%     -16.9%     107001 ±  8%  sched_debug.cpu.max_idle_balance_cost.stddev
      12639 ±  6%     -12.1%      11108 ±  8%  sched_debug.cpu.nr_switches.min
       0.06 ± 41%     -44.9%       0.04 ± 20%  perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
       1.84 ± 23%     -56.4%       0.80 ± 33%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
       0.08 ± 38%     -55.2%       0.04 ± 22%  perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
       7.55 ± 60%     -77.2%       1.72 ±152%  perf-sched.wait_time.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
      10.72 ± 60%     -73.8%       2.81 ±171%  perf-sched.wait_time.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
       0.28 ± 12%     -16.4%       0.23 ±  5%  perf-sched.wait_time.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
       8802 ±  3%      -4.3%       8427        proc-vmstat.nr_mapped
      54506 ±  5%      -5.2%      51656        proc-vmstat.nr_shmem
    8510048            -4.5%    8124296        proc-vmstat.numa_hit
      43091 ±  8%     +15.9%      49938 ±  6%  proc-vmstat.numa_huge_pte_updates
    7242046            -5.3%    6860532 ±  2%  proc-vmstat.numa_local
    3762770 ±  5%     +34.7%    5068087 ±  3%  proc-vmstat.numa_pages_migrated
   22235827 ±  8%     +15.8%   25759214 ±  6%  proc-vmstat.numa_pte_updates
   10591821            -5.4%   10024519 ±  2%  proc-vmstat.pgfault
    3762770 ±  5%     +34.7%    5068087 ±  3%  proc-vmstat.pgmigrate_success
     489883 ±  2%      -6.8%     456801 ±  3%  proc-vmstat.pgreuse
       7297 ±  5%     +34.8%       9838 ±  3%  proc-vmstat.thp_migration_success
   22825216            -7.4%   21132800 ±  3%  proc-vmstat.unevictable_pgs_scanned
      40.10            +4.2%      41.80        perf-stat.i.MPKI
       1.64            +0.1        1.74        perf-stat.i.branch-miss-rate%
    1920111            +6.9%    2051982        perf-stat.i.branch-misses
      60.50            +1.2       61.72        perf-stat.i.cache-miss-rate%
   12369678            +6.9%   13223477        perf-stat.i.cache-misses
   21918348            +4.6%   22934958        perf-stat.i.cache-references
      22544            -4.0%      21634        perf-stat.i.cycles-between-cache-misses
       1458           +12.1%       1635 ±  5%  perf-stat.i.instructions-per-iTLB-miss
       2.51            +2.4%       2.57        perf-stat.i.metric.M/sec
       3383            +2.3%       3460        perf-stat.i.minor-faults
     244016            +5.0%     256219        perf-stat.i.node-load-misses
    4544736            +9.5%    4977101 ±  3%  perf-stat.i.node-store-misses
    6126744            +5.5%    6463826 ±  2%  perf-stat.i.node-stores
       3383            +2.3%       3460        perf-stat.i.page-faults
      37.34            +3.4%      38.60        perf-stat.overall.MPKI
       1.64            +0.1        1.74        perf-stat.overall.branch-miss-rate%
      21951            -5.4%      20763        perf-stat.overall.cycles-between-cache-misses
    1866870            +7.1%    2000069        perf-stat.ps.branch-misses
   12385090            +6.6%   13198317        perf-stat.ps.cache-misses
   21609219            +4.6%   22595642        perf-stat.ps.cache-references
       3340            +2.3%       3418        perf-stat.ps.minor-faults
     243774            +4.9%     255759        perf-stat.ps.node-load-misses
    4560352            +9.0%    4973035 ±  3%  perf-stat.ps.node-store-misses
    6135666            +5.2%    6452858 ±  2%  perf-stat.ps.node-stores
       3340            +2.3%       3418        perf-stat.ps.page-faults
  1.775e+12            -6.5%  1.659e+12 ±  2%  perf-stat.total.instructions
      32.90 ± 14%     -14.9       17.99 ± 40%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
       0.60 ± 14%      +0.3        0.88 ± 23%  perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
       0.57 ± 49%      +0.4        0.93 ± 14%  perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec
       0.78 ± 12%      +0.4        1.15 ± 34%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read
       0.80 ± 14%      +0.4        1.17 ± 26%  perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
       0.82 ± 15%      +0.4        1.19 ± 33%  perf-profile.calltrace.cycles-pp.__libc_read.readn.perf_evsel__read.read_counters.process_interval
       0.80 ± 14%      +0.4        1.19 ± 33%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read.read_counters
       0.50 ± 46%      +0.4        0.89 ± 25%  perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
       0.59 ± 49%      +0.4        0.98 ± 19%  perf-profile.calltrace.cycles-pp.find_idlest_group.find_idlest_cpu.select_task_rq_fair.sched_exec.bprm_execve
       0.59 ± 48%      +0.4        1.00 ± 25%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64
       0.67 ± 47%      +0.4        1.10 ± 22%  perf-profile.calltrace.cycles-pp.sched_exec.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
       0.90 ± 18%      +0.4        1.33 ± 24%  perf-profile.calltrace.cycles-pp.show_numa_map.seq_read_iter.seq_read.vfs_read.ksys_read
       0.66 ± 46%      +0.4        1.09 ± 27%  perf-profile.calltrace.cycles-pp.gather_pte_stats.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
       0.68 ± 46%      +0.5        1.13 ± 27%  perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map
       0.68 ± 46%      +0.5        1.13 ± 27%  perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_vma
       0.68 ± 46%      +0.5        1.14 ± 27%  perf-profile.calltrace.cycles-pp.walk_page_vma.show_numa_map.seq_read_iter.seq_read.vfs_read
       0.68 ± 46%      +0.5        1.14 ± 27%  perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter.seq_read
       0.68 ± 46%      +0.5        1.14 ± 27%  perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_vma.show_numa_map.seq_read_iter
       0.40 ± 71%      +0.5        0.88 ± 20%  perf-profile.calltrace.cycles-pp._dl_addr
       0.93 ± 18%      +0.5        1.45 ± 28%  perf-profile.calltrace.cycles-pp.__fxstat64
       0.88 ± 18%      +0.5        1.41 ± 27%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__fxstat64
       0.88 ± 18%      +0.5        1.42 ± 28%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__fxstat64
       0.60 ± 73%      +0.6        1.24 ± 18%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
       0.23 ±142%      +0.7        0.88 ± 26%  perf-profile.calltrace.cycles-pp.show_stat.seq_read_iter.vfs_read.ksys_read.do_syscall_64
       2.87 ± 14%      +1.3        4.21 ± 23%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
       2.88 ± 14%      +1.4        4.23 ± 23%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      34.28 ± 13%     -14.6       19.70 ± 36%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
       0.13 ± 29%      -0.1        0.05 ± 76%  perf-profile.children.cycles-pp.schedule_tail
       0.12 ± 20%      -0.1        0.05 ± 78%  perf-profile.children.cycles-pp.__put_user_4
       0.18 ± 16%      +0.1        0.23 ± 13%  perf-profile.children.cycles-pp.__x64_sys_munmap
       0.09 ± 17%      +0.1        0.16 ± 27%  perf-profile.children.cycles-pp.__do_sys_brk
       0.01 ±223%      +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.acpi_ex_insert_into_field
       0.01 ±223%      +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.acpi_ex_opcode_1A_1T_1R
       0.01 ±223%      +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.acpi_ex_store_object_to_node
       0.01 ±223%      +0.1        0.08 ± 27%  perf-profile.children.cycles-pp.acpi_ex_write_data_to_field
       0.02 ±142%      +0.1        0.09 ± 50%  perf-profile.children.cycles-pp.common_perm_cond
       0.06 ± 58%      +0.1        0.14 ± 24%  perf-profile.children.cycles-pp.___slab_alloc
       0.02 ±144%      +0.1        0.10 ± 63%  perf-profile.children.cycles-pp.__alloc_pages_bulk
       0.06 ± 18%      +0.1        0.14 ± 58%  perf-profile.children.cycles-pp.security_inode_getattr
       0.12 ± 40%      +0.1        0.21 ± 28%  perf-profile.children.cycles-pp.__ptrace_may_access
       0.07 ± 33%      +0.1        0.18 ± 40%  perf-profile.children.cycles-pp.brk
       0.15 ± 14%      +0.1        0.26 ± 23%  perf-profile.children.cycles-pp.wq_worker_comm
       0.09 ± 87%      +0.1        0.21 ± 30%  perf-profile.children.cycles-pp.irq_get_next_irq
       0.93 ± 12%      +0.2        1.17 ± 19%  perf-profile.children.cycles-pp.do_dentry_open
       0.15 ± 30%      +0.3        0.43 ± 56%  perf-profile.children.cycles-pp.run_ksoftirqd
       0.54 ± 17%      +0.4        0.89 ± 20%  perf-profile.children.cycles-pp._dl_addr
       0.74 ± 19%      +0.4        1.09 ± 27%  perf-profile.children.cycles-pp.gather_pte_stats
       0.74 ± 25%      +0.4        1.10 ± 21%  perf-profile.children.cycles-pp.sched_exec
       0.76 ± 19%      +0.4        1.13 ± 27%  perf-profile.children.cycles-pp.walk_p4d_range
       0.76 ± 19%      +0.4        1.13 ± 27%  perf-profile.children.cycles-pp.walk_pud_range
       0.76 ± 19%      +0.4        1.14 ± 27%  perf-profile.children.cycles-pp.walk_page_vma
       0.76 ± 19%      +0.4        1.14 ± 27%  perf-profile.children.cycles-pp.__walk_page_range
       0.76 ± 19%      +0.4        1.14 ± 27%  perf-profile.children.cycles-pp.walk_pgd_range
       0.92 ± 13%      +0.4        1.33 ± 20%  perf-profile.children.cycles-pp.open_last_lookups
       0.90 ± 17%      +0.4        1.33 ± 24%  perf-profile.children.cycles-pp.show_numa_map
       0.43 ± 51%      +0.5        0.88 ± 26%  perf-profile.children.cycles-pp.show_stat
       1.49 ± 11%      +0.5        1.94 ± 15%  perf-profile.children.cycles-pp.__do_softirq
       1.22 ± 18%      +0.6        1.78 ± 16%  perf-profile.children.cycles-pp.update_sg_wakeup_stats
       1.28 ± 20%      +0.6        1.88 ± 18%  perf-profile.children.cycles-pp.find_idlest_group
       1.07 ± 16%      +0.6        1.67 ± 30%  perf-profile.children.cycles-pp.__fxstat64
       1.36 ± 20%      +0.6        1.98 ± 21%  perf-profile.children.cycles-pp.find_idlest_cpu
      30.64 ± 15%     -14.9       15.70 ± 46%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
       0.01 ±223%      +0.1        0.07 ± 36%  perf-profile.self.cycles-pp.pick_next_task_fair
       0.10 ± 28%      +0.1        0.17 ± 28%  perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
       0.00            +0.1        0.07 ± 32%  perf-profile.self.cycles-pp.touch_atime
       0.04 ±106%      +0.1        0.11 ± 18%  perf-profile.self.cycles-pp.___slab_alloc
       0.12 ± 37%      +0.1        0.20 ± 27%  perf-profile.self.cycles-pp.__ptrace_may_access
       0.05 ± 52%      +0.1        0.13 ± 75%  perf-profile.self.cycles-pp.pick_link
       0.14 ± 28%      +0.1        0.24 ± 34%  perf-profile.self.cycles-pp.__slab_free
       0.47 ± 19%      +0.3        0.79 ± 16%  perf-profile.self.cycles-pp._dl_addr
       1.00 ± 19%      +0.4        1.44 ± 18%  perf-profile.self.cycles-pp.update_sg_wakeup_stats
       6.04 ± 14%      +1.9        7.99 ± 18%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode



***************************************************************************************************
lkp-icl-2sp6: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
   gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
   fc769221b2 ("sched/numa: Remove unconditional scan logic using mm numa_scan_seq")
   167773d1dd ("sched/numa: Increase tasks' access history")

fc769221b23064c0 167773d1ddb5ffdd944f851f2cb
---------------- ---------------------------
          %stddev     %change         %stddev
              \          |                \
      36796 ±  6%     -19.0%      29811 ±  8%  uptime.idle
  3.231e+10 ±  7%     -21.6%  2.534e+10 ± 10%  cpuidle..time
   33785162 ±  7%     -21.8%   26431366 ± 10%  cpuidle..usage
      10.56 ±  7%      -1.5        9.02 ±  9%  mpstat.cpu.all.idle%
       0.01 ± 22%      +0.0        0.01 ± 11%  mpstat.cpu.all.iowait%
       0.17 ±  2%      -0.0        0.15 ±  4%  mpstat.cpu.all.soft%
     388157 ± 31%     +60.9%     624661 ± 36%  numa-numastat.node0.other_node
    4511165 ±  4%     -13.5%    3901276 ±  7%  numa-numastat.node1.numa_hit
     951382 ± 12%     -30.4%     661932 ± 31%  numa-numastat.node1.other_node
     388157 ± 31%     +60.9%     624658 ± 36%  numa-vmstat.node0.numa_other
    4510646 ±  4%     -13.5%    3900948 ±  7%  numa-vmstat.node1.numa_hit
     951382 ± 12%     -30.4%     661932 ± 31%  numa-vmstat.node1.numa_other
     305.08 ±  5%     +19.6%     364.96 ±  6%  sched_debug.cfs_rq:/.util_est_enqueued.avg
     989.11 ±  4%     +13.0%       1117 ±  6%  sched_debug.cfs_rq:/.util_est_enqueued.max
       5082 ±  6%     -19.0%       4114 ± 12%  sched_debug.cpu.curr->pid.stddev
      85229           -13.2%      74019 ±  9%  sched_debug.cpu.max_idle_balance_cost.stddev
       7575 ±  5%      -8.3%       6946 ±  3%  sched_debug.cpu.nr_switches.min
     394498 ±  5%     -21.0%     311653 ± 10%  turbostat.C1E
   33233046 ±  8%     -21.7%   26018024 ± 10%  turbostat.C6
      10.39 ±  7%      -1.5        8.90 ±  9%  turbostat.C6%
       7.77 ±  6%     -17.5%       6.41 ±  9%  turbostat.CPU%c1
     206.88            +2.9%     212.86        turbostat.RAMWatt
     372.30            -8.3%     341.49        autonuma-benchmark.numa01.seconds
     209.06           -10.7%     186.67 ±  6%  autonuma-benchmark.numa01_THREAD_ALLOC.seconds
       2408            -8.6%       2200 ±  2%  autonuma-benchmark.time.elapsed_time
       2408            -8.6%       2200 ±  2%  autonuma-benchmark.time.elapsed_time.max
    1221333 ±  2%      -5.1%    1159380 ±  2%  autonuma-benchmark.time.involuntary_context_switches
    3508627            -4.1%    3363550        autonuma-benchmark.time.minor_page_faults
      11174            +1.9%      11388        autonuma-benchmark.time.percent_of_cpu_this_job_got
     261419            -7.0%     243046 ±  2%  autonuma-benchmark.time.user_time
     220972 ±  7%     +22.1%     269753 ±  3%  proc-vmstat.numa_hint_faults
     164886 ± 11%     +19.4%     196883 ±  5%  proc-vmstat.numa_hint_faults_local
    7964964            -5.9%    7494239        proc-vmstat.numa_hit
      82885 ±  6%     +43.4%     118829 ±  6%  proc-vmstat.numa_huge_pte_updates
    6625289            -6.3%    6207618        proc-vmstat.numa_local
    6636312 ±  4%     +33.1%    8834573 ±  3%  proc-vmstat.numa_pages_migrated
   42671823 ±  6%     +43.2%   61094857 ±  6%  proc-vmstat.numa_pte_updates
    9173569            -6.2%    8602789        proc-vmstat.pgfault
    6636312 ±  4%     +33.1%    8834573 ±  3%  proc-vmstat.pgmigrate_success
     397595            -6.5%     371818        proc-vmstat.pgreuse
      12917 ±  4%     +33.2%      17200 ±  3%  proc-vmstat.thp_migration_success
   17964288            -8.7%   16401792 ±  2%  proc-vmstat.unevictable_pgs_scanned
       0.63 ± 12%      -0.3        0.28 ±100%  perf-profile.calltrace.cycles-pp.__libc_read.readn.evsel__read_counter.read_counters.process_interval
       1.17 ±  4%      -0.2        0.96 ± 14%  perf-profile.children.cycles-pp.__irq_exit_rcu
       0.65 ± 19%      -0.2        0.46 ± 13%  perf-profile.children.cycles-pp.task_mm_cid_work
       0.23 ± 16%      -0.2        0.08 ± 61%  perf-profile.children.cycles-pp.rcu_gp_kthread
       0.30 ±  5%      -0.1        0.16 ± 43%  perf-profile.children.cycles-pp.rebalance_domains
       0.13 ± 21%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.rcu_gp_fqs_loop
       0.25 ± 16%      -0.1        0.18 ± 14%  perf-profile.children.cycles-pp.lru_add_drain_cpu
       0.17 ±  9%      -0.1        0.11 ± 23%  perf-profile.children.cycles-pp.__perf_read_group_add
       0.09 ± 21%      -0.0        0.04 ± 72%  perf-profile.children.cycles-pp.__evlist__disable
       0.11 ± 19%      -0.0        0.07 ± 53%  perf-profile.children.cycles-pp.vma_link
       0.13 ±  6%      -0.0        0.09 ± 27%  perf-profile.children.cycles-pp.ptep_clear_flush
       0.07 ±  7%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.__kernel_read
       0.07 ±  7%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.simple_lookup
       0.09 ±  9%      +0.0        0.11 ± 10%  perf-profile.children.cycles-pp.exit_notify
       0.12 ± 14%      +0.0        0.16 ± 17%  perf-profile.children.cycles-pp.__do_set_cpus_allowed
       0.02 ±141%      +0.1        0.09 ± 40%  perf-profile.children.cycles-pp.__sysvec_call_function
       0.05 ± 78%      +0.1        0.13 ± 42%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
       0.03 ±141%      +0.1        0.12 ± 41%  perf-profile.children.cycles-pp.sysvec_call_function
       0.64 ± 19%      -0.2        0.45 ± 12%  perf-profile.self.cycles-pp.task_mm_cid_work
       0.07 ±  7%      -0.0        0.03 ±100%  perf-profile.self.cycles-pp.dequeue_task_fair
       0.05 ±  8%      +0.0        0.08 ± 14%  perf-profile.self.cycles-pp.file_free_rcu
       1057            +9.9%       1162 ±  2%  perf-stat.i.MPKI
      76.36 ±  2%      +4.6       80.91 ±  2%  perf-stat.i.cache-miss-rate%
  5.353e+08 ±  4%     +18.2%  6.327e+08 ±  3%  perf-stat.i.cache-misses
  7.576e+08            +9.3%  8.282e+08 ±  2%  perf-stat.i.cache-references
  3.727e+11            +1.7%  3.792e+11        perf-stat.i.cpu-cycles
     154.73            +1.5%     157.11        perf-stat.i.cpu-migrations
     722.61 ±  2%      -8.9%     658.12 ±  3%  perf-stat.i.cycles-between-cache-misses
       2.91            +1.7%       2.96        perf-stat.i.metric.GHz
       1242 ±  3%      +5.7%       1312 ±  2%  perf-stat.i.metric.K/sec
      12.73            +9.8%      13.98 ±  2%  perf-stat.i.metric.M/sec
     245601            +5.4%     258749        perf-stat.i.node-load-misses
      43.38            -2.5       40.91 ±  3%  perf-stat.i.node-store-miss-rate%
  2.267e+08 ±  3%      +8.8%  2.467e+08 ±  4%  perf-stat.i.node-store-misses
  3.067e+08 ±  5%     +25.2%  3.841e+08 ±  6%  perf-stat.i.node-stores
     915.00            +9.1%     998.24 ±  2%  perf-stat.overall.MPKI
      71.29 ±  3%      +5.7       77.00 ±  3%  perf-stat.overall.cache-miss-rate%
     702.58 ±  3%     -14.0%     604.23 ±  3%  perf-stat.overall.cycles-between-cache-misses
      42.48 ±  2%      -3.3       39.20 ±  5%  perf-stat.overall.node-store-miss-rate%
   5.33e+08 ±  4%     +18.1%  6.296e+08 ±  3%  perf-stat.ps.cache-misses
  7.475e+08            +9.4%  8.178e+08 ±  2%  perf-stat.ps.cache-references
  3.739e+11            +1.6%    3.8e+11        perf-stat.ps.cpu-cycles
     154.22            +1.6%     156.62        perf-stat.ps.cpu-migrations
       3655            +2.5%       3744        perf-stat.ps.minor-faults
     242759            +5.4%     255974        perf-stat.ps.node-load-misses
  2.255e+08 ±  3%      +8.9%  2.457e+08 ±  3%  perf-stat.ps.node-store-misses
  3.057e+08 ±  5%     +24.9%   3.82e+08 ±  6%  perf-stat.ps.node-stores
       3655            +2.5%       3744        perf-stat.ps.page-faults
  1.968e+12            -8.3%  1.805e+12 ±  2%  perf-stat.total.instructions
       0.03 ±141%    +283.8%       0.13 ± 85%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
       0.06 ± 77%    +254.1%       0.20 ± 54%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
       0.08 ± 28%     -89.5%       0.01 ±223%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm
       0.92 ± 10%     -33.4%       0.62 ± 20%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
       0.10 ± 22%     -27.2%       0.07 ±  8%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
       0.35 ±141%    +186.8%       1.02 ± 69%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
       1.47 ± 81%    +262.6%       5.32 ± 79%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
       2.42 ± 42%    +185.9%       6.91 ± 52%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
       0.26 ±  9%   +1470.7%       4.16 ±115%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
       3.61 ±  7%     -25.3%       2.70 ± 18%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
       0.08 ± 28%     -89.5%       0.01 ±223%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm
      17.44 ±  4%     -19.0%      14.12 ± 13%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      23.36 ± 21%     -37.2%      14.67 ± 22%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     107.00           +11.5%     119.33 ±  4%  perf-sched.wait_and_delay.count.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      75.00            +9.6%      82.17 ±  2%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      79.99 ± 97%     -86.8%      10.52 ± 41%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
     145.98 ± 14%     -41.5%      85.46 ± 22%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
       1.20 ± 94%    +152.3%       3.03 ± 31%  perf-sched.wait_time.avg.ms.__cond_resched.change_pmd_range.change_p4d_range.change_protection_range.mprotect_fixup
       2.30 ± 57%     -90.9%       0.21 ±205%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part
       0.06 ±  8%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
       0.58 ± 81%     -76.6%       0.14 ± 50%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_lookupat.filename_lookup
       2.63 ± 21%     -59.4%       1.07 ± 68%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
       2.68 ± 40%     -79.5%       0.55 ±174%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
       3.59 ± 17%     -52.9%       1.69 ± 98%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.mmap_region
       4.05 ±  2%     -80.6%       0.79 ±133%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.mprotect_fixup
       3.75 ± 19%     -81.9%       0.68 ±135%  perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
       1527 ± 70%     -84.5%     236.84 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      16.13 ±  4%     -21.4%      12.69 ± 15%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
       1.16 ±117%     -99.1%       0.01 ±223%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
       0.26 ± 25%     -93.2%       0.02 ±223%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm
      22.43 ± 21%     -37.4%      14.05 ± 22%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
       4.41 ±  8%     -94.9%       0.22 ±191%  perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part
       0.08 ± 29%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
       6.20 ±  8%     -21.6%       4.87 ± 13%  perf-sched.wait_time.max.ms.__cond_resched.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
       4.23 ±  5%     -68.3%       1.34 ±136%  perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
       3053 ± 70%     -92.2%     236.84 ±223%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
       4.78 ± 33%  +10431.5%     502.95 ± 99%  perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      79.99 ± 97%     -86.9%      10.51 ± 41%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
       2.13 ±128%     -99.5%       0.01 ±223%  perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
       0.26 ± 25%     -92.4%       0.02 ±223%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.__access_remote_vm
     142.79 ± 13%     -40.9%      84.32 ± 22%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone




I hope I can add your tested-by if I need to REBASE the patch for -mm
tree depending on the feedback I get any further with any minor changes.






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux