Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ying,

On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
> Hi, Yujie,
> 
> kernel test robot <yujie.liu@xxxxxxxxx> writes:
> 
> > Hello,
> > 
> > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
> > 
> > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > in testcase: vm-scalability
> > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> > with following parameters:
> > 
> >         runtime: 300s
> >         size: 512G
> >         test: anon-cow-rand-mt
> >         cpufreq_governor: performance
> > 
> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> > 
> > 
> > If you fix the issue, kindly add following tag
> > > Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > > Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@xxxxxxxxx
> > 
> 
> Thanks a lot for report!  Can you try whether the debug patch as
> below can restore the regression?

We've tested the patch and found the throughput score was partially
restored from -3.6% to -1.4%, still with a slight performance drop.
Please check the detailed data as follows:

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability

commit: 
  ebe75e4751063 ("migrate_pages: share more code between _unmap and _move")
  7e12beb8ca2ac ("migrate_pages: batch flushing TLB")
  9a30245d65679 ("dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible")

ebe75e4751063dce 7e12beb8ca2ac98b2ec42e0ea4b 9a30245d656794d171cd798a2be 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     57634            -3.5%      55603            -1.5%      56788        vm-scalability.median
     81.16 ± 12%      -5.0       76.17 ± 35%     -20.0       61.18 ± 21%  vm-scalability.stddev%
   5528051            -3.6%    5328506            -1.4%    5449450        vm-scalability.throughput
    200293 ±  3%      -7.3%     185675 ±  2%      -4.3%     191707 ±  2%  vm-scalability.time.involuntary_context_switches
  67952989 ±  5%     +43.1%   97269013 ±  2%     +35.6%   92147668 ±  3%  vm-scalability.time.minor_page_faults
      9006            -1.8%       8844            -0.6%       8956        vm-scalability.time.percent_of_cpu_this_job_got
      1178 ±  3%     +57.2%       1852 ±  3%      +8.6%       1278 ±  3%  vm-scalability.time.system_time
     26327            -4.5%      25132            -1.0%      26056        vm-scalability.time.user_time
     11378 ±  5%    +359.9%      52332 ±  7%    +118.5%      24867 ±  7%  vm-scalability.time.voluntary_context_switches
 1.662e+09            -3.7%  1.601e+09            -1.5%  1.638e+09        vm-scalability.workload
     79922 ±  3%      +9.3%      87378 ±  3%      +3.3%      82589 ±  8%  numa-meminfo.node1.SUnreclaim
    399014 ±192%     -84.9%      60246 ±129%     -13.6%     344869 ±239%  numa-meminfo.node1.Unevictable
      2022 ±  3%     +11.6%       2257            +3.6%       2095        vmstat.system.cs
    539357 ±  2%    +187.0%    1547747 ±  8%     +32.9%     716886 ±  4%  vmstat.system.in
      0.00 ±184%      +0.0        0.00 ±  6%      +0.0        0.00 ± 25%  mpstat.cpu.all.iowait%
      2.58            +1.7        4.27 ±  4%      +0.5        3.09 ±  3%  mpstat.cpu.all.irq%
      4.06 ±  3%      +2.3        6.36 ±  3%      +0.3        4.40 ±  3%  mpstat.cpu.all.sys%
     19980 ±  3%      +9.3%      21844 ±  3%      +3.3%      20646 ±  8%  numa-vmstat.node1.nr_slab_unreclaimable
     99752 ±192%     -84.9%      15061 ±129%     -13.6%      86216 ±239%  numa-vmstat.node1.nr_unevictable
     99752 ±192%     -84.9%      15061 ±129%     -13.6%      86216 ±239%  numa-vmstat.node1.nr_zone_unevictable
    205569 ±  7%    +131.1%     475135 ± 99%     +66.5%     342364 ± 91%  turbostat.C1
 1.382e+09 ±  2%    +140.0%  3.317e+09 ±  5%     +30.4%  1.803e+09 ±  3%  turbostat.IRQ
      9095 ± 14%    +446.4%      49695 ±  7%    +149.0%      22643 ± 11%  turbostat.POLL
     86.84            -2.4%      84.76            -1.4%      85.63        turbostat.RAMWatt
    200293 ±  3%      -7.3%     185675 ±  2%      -4.3%     191707 ±  2%  time.involuntary_context_switches
     67.11 ± 56%     -92.3%       5.17 ± 55%     -95.4%       3.11 ± 80%  time.major_page_faults
  67952989 ±  5%     +43.1%   97269013 ±  2%     +35.6%   92147668 ±  3%  time.minor_page_faults
      9006            -1.8%       8844            -0.6%       8956        time.percent_of_cpu_this_job_got
      1178 ±  3%     +57.2%       1852 ±  3%      +8.6%       1278 ±  3%  time.system_time
     26327            -4.5%      25132            -1.0%      26056        time.user_time
     11378 ±  5%    +359.9%      52332 ±  7%    +118.5%      24867 ±  7%  time.voluntary_context_switches
    143480 ±  3%     -20.9%     113504 ± 11%     -12.0%     126262 ±  4%  sched_debug.cfs_rq:/.min_vruntime.stddev
    548123 ±  7%     -49.1%     279239 ± 34%     -20.7%     434543 ±  9%  sched_debug.cfs_rq:/.spread0.avg
    655329 ±  6%     -36.3%     417735 ± 22%     -16.2%     549218 ±  6%  sched_debug.cfs_rq:/.spread0.max
    143388 ±  3%     -20.8%     113612 ± 11%     -11.9%     126295 ±  4%  sched_debug.cfs_rq:/.spread0.stddev
     39.81 ± 28%     +45.0%      57.73 ± 19%     +17.8%      46.89 ± 44%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
    240478 ±  6%     -12.9%     209367 ±  7%     -12.0%     211715 ±  5%  sched_debug.cpu.avg_idle.avg
      1597           +10.4%       1763 ±  3%      +2.3%       1633        sched_debug.cpu.clock_task.stddev
      1938 ±  5%     +29.1%       2503           +11.4%       2160 ±  3%  sched_debug.cpu.nr_switches.min
  39960890 ±  6%     +68.3%   67272793 ±  2%     +54.7%   61837739 ±  4%  proc-vmstat.numa_hint_faults
  19987976 ±  6%     +68.7%   33722069 ±  2%     +55.1%   30996483 ±  4%  proc-vmstat.numa_hint_faults_local
  28840932 ±  3%      +6.9%   30817082 ±  5%      +8.0%   31160418 ±  4%  proc-vmstat.numa_hit
  28753783 ±  3%      +6.9%   30727992 ±  5%      +8.1%   31074486 ±  4%  proc-vmstat.numa_local
  19745743 ±  5%     +10.0%   21720583 ±  7%     +11.8%   22080123 ±  6%  proc-vmstat.numa_pages_migrated
  40107839 ±  6%     +68.1%   67430626 ±  2%     +54.6%   61988683 ±  4%  proc-vmstat.numa_pte_updates
  37158989 ±  2%      +5.3%   39124260 ±  3%      +6.3%   39482935 ±  3%  proc-vmstat.pgalloc_normal
  68856116 ±  5%     +42.6%   98184580 ±  2%     +35.1%   93057570 ±  3%  proc-vmstat.pgfault
  19745743 ±  5%     +10.0%   21720583 ±  7%     +11.8%   22080123 ±  6%  proc-vmstat.pgmigrate_success
  19754280 ±  5%     +10.0%   21735325 ±  7%     +11.8%   22080663 ±  6%  proc-vmstat.pgreuse
      0.17 ±  7%      +0.1        0.23 ±  3%      +0.0        0.18 ±  5%  perf-stat.i.branch-miss-rate%
   8953845 ±  3%     +61.0%   14417578 ±  3%     +13.3%   10142474 ±  2%  perf-stat.i.branch-misses
     66.30            -1.8       64.47            -0.3       65.98        perf-stat.i.cache-miss-rate%
      1904 ±  3%     +12.3%       2139            +3.9%       1979        perf-stat.i.context-switches
    158.09           +11.3%     175.92 ±  3%      +7.5%     170.00 ±  2%  perf-stat.i.cpu-migrations
      0.04 ±  9%      +0.0        0.05 ± 11%      +0.0        0.04 ±  7%  perf-stat.i.dTLB-load-miss-rate%
   4856144 ±  8%     +41.5%    6870029 ±  9%     +12.3%    5455416 ±  7%  perf-stat.i.dTLB-load-misses
      9.10            -0.4        8.71            -0.1        8.97        perf-stat.i.dTLB-store-miss-rate%
  5.33e+08            -4.4%  5.095e+08            -1.8%  5.233e+08        perf-stat.i.dTLB-store-misses
   2454429 ±  2%    +159.7%    6374895 ±  7%     +26.7%    3110501 ±  5%  perf-stat.i.iTLB-load-misses
    116140 ±  2%     +60.9%     186840 ±  7%      -3.6%     111933 ±  4%  perf-stat.i.iTLB-loads
     41691 ±  5%     -23.0%      32083 ± 26%      +1.7%      42380 ± 20%  perf-stat.i.instructions-per-iTLB-miss
      0.31 ± 38%     -59.1%       0.13 ± 27%     -68.9%       0.10 ± 31%  perf-stat.i.major-faults
    224958 ±  5%     +42.4%     320417 ±  2%     +35.4%     304571 ±  3%  perf-stat.i.minor-faults
     50.61            +1.6       52.22            +0.7       51.35        perf-stat.i.node-load-miss-rate%
 1.169e+08            +3.3%  1.208e+08            +0.9%  1.179e+08        perf-stat.i.node-load-misses
 1.132e+08            -3.7%  1.089e+08            -2.1%  1.108e+08        perf-stat.i.node-loads
 2.688e+08            -3.9%  2.582e+08            -1.8%   2.64e+08        perf-stat.i.node-store-misses
 2.664e+08            -4.5%  2.543e+08            -1.7%  2.618e+08        perf-stat.i.node-stores
    224959 ±  5%     +42.4%     320418 ±  2%     +35.4%     304571 ±  3%  perf-stat.i.page-faults
      0.08 ±  4%      +0.0        0.12 ±  4%      +0.0        0.09 ±  3%  perf-stat.overall.branch-miss-rate%
     67.15            -1.9       65.28            -0.5       66.64        perf-stat.overall.cache-miss-rate%
    366.74            +2.9%     377.43            +1.2%     371.26        perf-stat.overall.cycles-between-cache-misses
      0.03 ±  8%      +0.0        0.05 ± 10%      +0.0        0.04 ±  8%  perf-stat.overall.dTLB-load-miss-rate%
      9.38            -0.4        8.97            -0.1        9.25        perf-stat.overall.dTLB-store-miss-rate%
     95.49            +1.7       97.16            +1.0       96.53        perf-stat.overall.iTLB-load-miss-rate%
     20490 ±  3%     -61.8%       7826 ±  7%     -21.5%      16077 ±  6%  perf-stat.overall.instructions-per-iTLB-miss
     50.81            +1.8       52.60            +0.8       51.56        perf-stat.overall.node-load-miss-rate%
      9210            +3.0%       9485            +0.7%       9271        perf-stat.overall.path-length
   8906114 ±  3%     +61.8%   14412101 ±  3%     +13.3%   10090374 ±  2%  perf-stat.ps.branch-misses
      1906 ±  3%     +12.3%       2142            +3.8%       1979        perf-stat.ps.context-switches
    157.57           +11.7%     176.03 ±  3%      +7.6%     169.49 ±  2%  perf-stat.ps.cpu-migrations
   4843373 ±  8%     +41.9%    6871859 ±  9%     +12.3%    5440606 ±  7%  perf-stat.ps.dTLB-load-misses
 5.313e+08            -4.4%  5.077e+08            -1.8%  5.218e+08        perf-stat.ps.dTLB-store-misses
   2444301 ±  2%    +161.3%    6385873 ±  7%     +26.8%    3098710 ±  5%  perf-stat.ps.iTLB-load-misses
    115384 ±  2%     +61.5%     186290 ±  7%      -3.7%     111109 ±  4%  perf-stat.ps.iTLB-loads
      0.31 ± 38%     -59.0%       0.13 ± 27%     -68.8%       0.10 ± 31%  perf-stat.ps.major-faults
    224444 ±  5%     +42.8%     320615 ±  2%     +35.3%     303619 ±  3%  perf-stat.ps.minor-faults
 1.165e+08            +3.4%  1.205e+08            +0.9%  1.176e+08        perf-stat.ps.node-load-misses
 1.128e+08            -3.8%  1.086e+08            -2.1%  1.105e+08        perf-stat.ps.node-loads
  2.68e+08            -4.0%  2.573e+08            -1.8%  2.632e+08        perf-stat.ps.node-store-misses
 2.656e+08            -4.6%  2.534e+08            -1.7%   2.61e+08        perf-stat.ps.node-stores
    224444 ±  5%     +42.8%     320615 ±  2%     +35.3%     303620 ±  3%  perf-stat.ps.page-faults
     19.08 ± 10%      -1.7       17.34 ±  4%      +0.5       19.59        perf-profile.calltrace.cycles-pp.nrand48_r
      1.26 ± 15%      -1.3        0.00            -1.3        0.00        perf-profile.calltrace.cycles-pp.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
      1.14 ± 15%      -1.1        0.00            -1.1        0.00        perf-profile.calltrace.cycles-pp.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page
      1.12 ± 15%      -1.1        0.00            -1.1        0.00        perf-profile.calltrace.cycles-pp.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages
      1.08 ± 15%      -1.1        0.00            -1.1        0.00        perf-profile.calltrace.cycles-pp.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch
      0.92 ± 15%      -0.9        0.00            -0.9        0.00        perf-profile.calltrace.cycles-pp.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap
      0.91 ± 15%      -0.9        0.00            -0.9        0.00        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate
      0.91 ± 15%      -0.9        0.00            -0.9        0.00        perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon
      0.91 ± 15%      -0.9        0.00            -0.9        0.00        perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one
      6.40 ±  9%      -0.5        5.94 ±  4%      +0.1        6.54        perf-profile.calltrace.cycles-pp.lrand48_r
      0.26 ±112%      -0.3        0.00            -0.3        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.19 ±141%      -0.2        0.00            -0.2        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault
      4.13 ±  3%      -0.1        4.04            -0.0        4.12        perf-profile.calltrace.cycles-pp.do_rw_once
      0.06 ±282%      -0.1        0.00            -0.1        0.00        perf-profile.calltrace.cycles-pp.rmap_walk_anon.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
      0.13 ±188%      +0.1        0.24 ±144%      -0.0        0.11 ±187%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.nrand48_r
      0.00            +0.1        0.10 ±223%      +0.0        0.00        perf-profile.calltrace.cycles-pp.update_load_avg.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle
      0.00            +0.1        0.11 ±223%      +0.0        0.00        perf-profile.calltrace.cycles-pp.update_curr.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle
      0.07 ±282%      +0.1        0.21 ±144%      -0.1        0.00        perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
      0.07 ±282%      +0.1        0.21 ±144%      -0.1        0.00        perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
      0.07 ±282%      +0.1        0.22 ±144%      -0.1        0.00        perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
      0.00            +0.2        0.17 ±141%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__default_send_IPI_dest_field.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush
      0.00            +0.3        0.26 ±100%      +0.0        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.nrand48_r
      0.00            +0.4        0.36 ± 70%      +0.1        0.06 ±282%  perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
      0.00            +0.4        0.36 ± 70%      +0.1        0.06 ±282%  perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
      1.44 ± 28%      +0.5        1.94 ± 61%      +0.1        1.51 ± 25%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
      1.43 ± 29%      +0.5        1.93 ± 61%      +0.1        1.50 ± 25%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
      0.55 ± 69%      +0.5        1.08 ± 69%      +0.0        0.60 ± 56%  perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
      1.34 ± 39%      +0.6        1.90 ± 69%      +0.0        1.35 ± 25%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.17 ±196%      +0.6        0.73 ± 85%      +0.2        0.33 ± 89%  perf-profile.calltrace.cycles-pp.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer
      1.72 ± 25%      +0.6        2.30 ± 48%      +0.1        1.80 ± 22%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.do_access
      1.08 ± 31%      +0.6        1.66 ± 72%      +0.1        1.13 ± 26%  perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
      1.52 ± 28%      +0.6        2.11 ± 52%      +0.1        1.58 ± 25%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
      1.09 ± 31%      +0.6        1.68 ± 72%      +0.1        1.14 ± 26%  perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      1.18 ± 30%      +0.6        1.78 ± 70%      +0.1        1.24 ± 26%  perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.00            +0.6        0.60 ±  8%      +0.0        0.00        perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
      0.00            +0.6        0.64 ±  7%      +0.0        0.00        perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      0.00            +0.9        0.90 ± 10%      +0.0        0.00        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
     72.48 ±  3%      +1.4       73.88            -0.7       71.79        perf-profile.calltrace.cycles-pp.do_access
      0.00            +1.9        1.86 ±  9%      +0.3        0.26 ±113%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
      0.00            +1.9        1.87 ±  8%      +0.3        0.26 ±113%  perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
      0.00            +1.9        1.94 ±  8%      +0.3        0.33 ± 91%  perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access
      0.00            +2.6        2.59 ±  9%      +0.6        0.59 ± 40%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access
      0.00            +2.8        2.80 ±  8%      +0.9        0.90 ± 18%  perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
      3.30 ± 15%      +6.6        9.88 ±  7%      +0.9        4.18 ± 19%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      3.34 ± 15%      +6.6        9.94 ±  7%      +0.9        4.22 ± 19%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
      3.03 ± 15%      +6.7        9.69 ±  7%      +1.0        4.03 ± 19%  perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      3.68 ± 15%      +6.8       10.48 ±  7%      +0.9        4.63 ± 19%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
      3.70 ± 15%      +6.8       10.49 ±  7%      +0.9        4.64 ± 19%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
      3.89 ± 14%      +6.8       10.71 ±  7%      +1.0        4.85 ± 19%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
      2.46 ± 15%      +7.0        9.46 ±  7%      +1.4        3.85 ± 19%  perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      2.27 ± 15%      +7.0        9.28 ±  7%      +1.4        3.67 ± 19%  perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault
      2.27 ± 15%      +7.0        9.29 ±  7%      +1.4        3.68 ± 19%  perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault
      0.00            +7.5        7.50 ±  7%      +2.4        2.38 ± 18%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch
      0.00            +7.6        7.56 ±  7%      +2.4        2.40 ± 18%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages
      0.00            +7.6        7.57 ±  8%      +2.4        2.40 ± 18%  perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page
      0.00            +7.6        7.57 ±  7%      +2.4        2.40 ± 18%  perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
     16.69 ± 10%      -1.3       15.43 ±  5%      +0.5       17.16        perf-profile.children.cycles-pp.nrand48_r
      1.51 ± 16%      -1.1        0.42 ±  9%      -1.2        0.31 ± 20%  perf-profile.children.cycles-pp.rmap_walk_anon
      1.25 ± 16%      -1.0        0.30 ±  9%      -1.0        0.29 ± 20%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      0.92 ± 15%      -0.9        0.00            -0.9        0.00        perf-profile.children.cycles-pp.ptep_clear_flush
      0.92 ± 15%      -0.9        0.00            -0.9        0.00        perf-profile.children.cycles-pp.flush_tlb_mm_range
      9.27 ±  8%      -0.9        8.37 ±  4%      +0.2        9.45        perf-profile.children.cycles-pp.lrand48_r
      1.08 ± 15%      -0.9        0.18 ±  6%      -1.0        0.12 ± 21%  perf-profile.children.cycles-pp.try_to_migrate_one
      1.14 ± 15%      -0.9        0.26 ±  8%      -0.9        0.19 ± 19%  perf-profile.children.cycles-pp.try_to_migrate
      1.05 ± 15%      -0.8        0.21 ± 11%      -0.9        0.16 ± 16%  perf-profile.children.cycles-pp._raw_spin_lock
      1.26 ± 15%      -0.8        0.42 ±  8%      -0.9        0.34 ± 21%  perf-profile.children.cycles-pp.migrate_folio_unmap
      0.46 ± 15%      -0.3        0.14 ± 13%      -0.3        0.11 ± 20%  perf-profile.children.cycles-pp.page_vma_mapped_walk
      0.34 ± 15%      -0.2        0.11 ± 11%      -0.3        0.08 ± 18%  perf-profile.children.cycles-pp.remove_migration_pte
      0.14 ± 16%      -0.1        0.00            -0.1        0.00        perf-profile.children.cycles-pp.handle_pte_fault
      4.37 ±  3%      -0.1        4.29            -0.0        4.36        perf-profile.children.cycles-pp.do_rw_once
      0.13 ± 22%      -0.1        0.07 ± 11%      -0.0        0.09 ± 23%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.13 ± 22%      -0.1        0.08 ± 10%      -0.0        0.09 ± 22%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.33 ±  2%      -0.0        0.30            -0.0        0.32 ±  2%  perf-profile.children.cycles-pp.lrand48_r@plt
      0.17 ± 21%      -0.0        0.14 ±  9%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.folio_isolate_lru
      0.02 ±112%      -0.0        0.00            +0.0        0.03 ±111%  perf-profile.children.cycles-pp.timerqueue_del
      0.19 ± 20%      -0.0        0.17 ±  8%      -0.0        0.17 ± 20%  perf-profile.children.cycles-pp.numamigrate_isolate_page
      0.06 ± 13%      -0.0        0.04 ± 45%      -0.0        0.05 ± 37%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.06 ± 13%      -0.0        0.04 ± 45%      -0.0        0.05 ± 37%  perf-profile.children.cycles-pp.do_syscall_64
      0.01 ±193%      -0.0        0.00            -0.0        0.01 ±188%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.09 ± 20%      -0.0        0.08 ± 47%      +0.0        0.09 ± 23%  perf-profile.children.cycles-pp.tick_sched_do_timer
      0.07 ± 39%      -0.0        0.06 ± 45%      +0.0        0.07 ± 28%  perf-profile.children.cycles-pp.ktime_get_update_offsets_now
      0.01 ±282%      -0.0        0.00            -0.0        0.01 ±282%  perf-profile.children.cycles-pp.perf_rotate_context
      0.02 ±111%      -0.0        0.02 ±142%      +0.0        0.03 ±112%  perf-profile.children.cycles-pp.irqtime_account_process_tick
      0.06 ± 39%      -0.0        0.06 ±  8%      +0.0        0.07 ± 21%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.00            +0.0        0.00            +0.0        0.01 ±282%  perf-profile.children.cycles-pp.__free_one_page
      0.00            +0.0        0.00            +0.0        0.01 ±187%  perf-profile.children.cycles-pp.lru_add_fn
      0.07 ± 27%      +0.0        0.07 ± 47%      -0.0        0.06 ± 55%  perf-profile.children.cycles-pp.ktime_get
      0.09 ± 15%      +0.0        0.10 ±  8%      +0.0        0.11 ± 21%  perf-profile.children.cycles-pp.rmqueue
      0.09 ± 39%      +0.0        0.10 ± 50%      -0.0        0.07 ± 75%  perf-profile.children.cycles-pp.cpuacct_account_field
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.run_posix_cpu_timers
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.nohz_balance_exit_idle
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.reweight_entity
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.__hrtimer_next_event_base
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.nohz_balancer_kick
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.trigger_load_balance
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.check_cpu_stall
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.perf_event_task_tick
      0.09 ± 16%      +0.0        0.10 ±  7%      +0.0        0.11 ± 22%  perf-profile.children.cycles-pp.__alloc_pages
      0.09 ± 16%      +0.0        0.10 ± 10%      +0.0        0.11 ± 21%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.children.cycles-pp.acct_account_cputime
      0.09 ± 18%      +0.0        0.10 ±  7%      +0.0        0.11 ± 22%  perf-profile.children.cycles-pp.__folio_alloc
      0.01 ±282%      +0.0        0.02 ±142%      -0.0        0.00        perf-profile.children.cycles-pp.rcu_core
      0.32 ± 19%      +0.0        0.34 ± 45%      +0.0        0.33 ± 32%  perf-profile.children.cycles-pp.account_user_time
      0.12 ± 95%      +0.0        0.14 ±  6%      -0.0        0.11 ± 16%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      0.09 ± 18%      +0.0        0.11 ±  9%      +0.0        0.11 ± 22%  perf-profile.children.cycles-pp.alloc_misplaced_dst_page
      0.06 ± 18%      +0.0        0.08 ± 69%      +0.0        0.07 ± 41%  perf-profile.children.cycles-pp.rcu_pending
      0.00            +0.0        0.02 ±141%      +0.0        0.00        perf-profile.children.cycles-pp.set_tlb_ubc_flush_pending
      0.00            +0.0        0.02 ±141%      +0.0        0.00        perf-profile.children.cycles-pp.folio_lock_anon_vma_read
      0.00            +0.0        0.02 ±141%      +0.0        0.01 ±282%  perf-profile.children.cycles-pp.folio_get_anon_vma
      0.06 ± 18%      +0.0        0.08 ±  9%      +0.0        0.06 ± 19%  perf-profile.children.cycles-pp.mt_find
      0.21 ± 17%      +0.0        0.23 ±  8%      -0.0        0.21 ± 18%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.06 ± 16%      +0.0        0.08 ±  8%      +0.0        0.08 ± 21%  perf-profile.children.cycles-pp.free_unref_page
      0.06 ± 18%      +0.0        0.08 ± 11%      +0.0        0.06 ± 20%  perf-profile.children.cycles-pp.find_vma
      0.11 ± 16%      +0.0        0.12 ± 66%      +0.0        0.13 ± 29%  perf-profile.children.cycles-pp.__cgroup_account_cputime_field
      0.01 ±282%      +0.0        0.03 ±102%      -0.0        0.00        perf-profile.children.cycles-pp.lapic_next_deadline
      0.03 ± 71%      +0.0        0.06 ±  8%      +0.0        0.05 ± 39%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.02 ±209%      +0.0        0.04 ±103%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.update_cfs_group
      0.01 ±282%      +0.0        0.03 ±105%      -0.0        0.00        perf-profile.children.cycles-pp.hrtimer_update_next_event
      0.05 ± 43%      +0.0        0.08 ± 61%      -0.0        0.05 ± 57%  perf-profile.children.cycles-pp.update_irq_load_avg
      0.00            +0.0        0.02 ± 99%      +0.0        0.00        perf-profile.children.cycles-pp.__perf_sw_event
      0.08 ± 15%      +0.0        0.10 ± 10%      +0.0        0.10 ± 21%  perf-profile.children.cycles-pp.__list_del_entry_valid
      0.09 ± 47%      +0.0        0.12 ± 70%      -0.0        0.08 ± 43%  perf-profile.children.cycles-pp.hrtimer_active
      0.01 ±282%      +0.0        0.03 ±106%      -0.0        0.00        perf-profile.children.cycles-pp.update_min_vruntime
      0.08 ± 18%      +0.0        0.11 ± 68%      +0.0        0.09 ± 26%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
      0.07 ± 35%      +0.0        0.10 ± 33%      +0.0        0.08 ± 26%  perf-profile.children.cycles-pp.clockevents_program_event
      0.01 ±282%      +0.0        0.04 ±110%      -0.0        0.00        perf-profile.children.cycles-pp.timerqueue_add
      0.04 ± 91%      +0.0        0.07 ± 50%      +0.0        0.06 ± 38%  perf-profile.children.cycles-pp.arch_scale_freq_tick
      0.02 ±154%      +0.0        0.06 ± 74%      +0.0        0.03 ± 92%  perf-profile.children.cycles-pp.__do_softirq
      0.00            +0.0        0.04 ± 71%      +0.0        0.02 ±142%  perf-profile.children.cycles-pp.can_change_pte_writable
      0.01 ±282%      +0.0        0.04 ±107%      -0.0        0.00        perf-profile.children.cycles-pp.enqueue_hrtimer
      0.00            +0.0        0.04 ± 44%      +0.0        0.00        perf-profile.children.cycles-pp.tlb_is_not_lazy
      0.00            +0.0        0.04 ± 45%      +0.0        0.00        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.15 ± 20%      +0.0        0.20 ±  8%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.11 ± 25%      +0.0        0.16 ± 64%      +0.0        0.11 ± 25%  perf-profile.children.cycles-pp.update_rq_clock
      0.03 ±118%      +0.1        0.08 ± 58%      +0.0        0.05 ± 59%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
      0.03 ±127%      +0.1        0.09 ± 84%      +0.0        0.04 ± 72%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.00            +0.1        0.06 ±  9%      +0.0        0.02 ±142%  perf-profile.children.cycles-pp.folio_migrate_flags
      0.03 ±152%      +0.1        0.09 ± 68%      +0.0        0.04 ± 72%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.00            +0.1        0.07 ±  8%      +0.0        0.00        perf-profile.children.cycles-pp.native_sched_clock
      0.05 ± 36%      +0.1        0.12 ±  8%      +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      0.06 ± 13%      +0.1        0.13 ±  8%      +0.1        0.11 ± 16%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.16 ± 13%      +0.1        0.24 ± 10%      +0.0        0.18 ± 19%  perf-profile.children.cycles-pp.up_read
      0.00            +0.1        0.08 ± 10%      +0.0        0.00        perf-profile.children.cycles-pp.sched_clock_cpu
      0.02 ±141%      +0.1        0.10 ±  8%      +0.0        0.05 ± 42%  perf-profile.children.cycles-pp.uncharge_batch
      0.01 ±282%      +0.1        0.09 ± 12%      +0.0        0.04 ± 75%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.04 ± 71%      +0.1        0.12 ±  8%      +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.task_work_run
      0.00            +0.1        0.09 ± 10%      +0.0        0.01 ±282%  perf-profile.children.cycles-pp._find_next_bit
      0.02 ±141%      +0.1        0.10 ± 10%      +0.0        0.06 ± 44%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
      0.02 ±141%      +0.1        0.10 ± 10%      +0.0        0.06 ± 44%  perf-profile.children.cycles-pp.__folio_put
      0.19 ± 17%      +0.1        0.28 ± 11%      +0.0        0.21 ± 18%  perf-profile.children.cycles-pp.down_read_trylock
      0.03 ± 90%      +0.1        0.12 ±  8%      +0.1        0.10 ± 16%  perf-profile.children.cycles-pp.change_pte_range
      0.03 ± 90%      +0.1        0.12 ±  8%      +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.task_numa_work
      0.03 ± 90%      +0.1        0.12 ±  8%      +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.change_prot_numa
      0.03 ± 90%      +0.1        0.12 ±  8%      +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.change_protection_range
      0.03 ± 90%      +0.1        0.12 ±  8%      +0.1        0.10 ± 18%  perf-profile.children.cycles-pp.change_pmd_range
      0.21 ± 19%      +0.1        0.31 ±  8%      +0.0        0.22 ± 21%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.02 ±142%      +0.1        0.12 ±  6%      +0.0        0.04 ± 72%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.08 ± 36%      +0.1        0.18 ± 24%      +0.0        0.09 ± 24%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.21 ± 19%      +0.1        0.31 ±  8%      +0.0        0.22 ± 20%  perf-profile.children.cycles-pp.lru_add_drain
      0.21 ± 19%      +0.1        0.31 ±  8%      +0.0        0.22 ± 20%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.03 ± 71%      +0.1        0.14 ±  8%      +0.0        0.08 ± 25%  perf-profile.children.cycles-pp.mem_cgroup_migrate
      0.01 ±187%      +0.1        0.13 ±  6%      +0.1        0.07 ± 26%  perf-profile.children.cycles-pp.page_counter_charge
      0.17 ± 13%      +0.1        0.30 ±  9%      +0.1        0.24 ± 19%  perf-profile.children.cycles-pp.folio_copy
      0.17 ± 14%      +0.1        0.30 ±  9%      +0.1        0.23 ± 20%  perf-profile.children.cycles-pp.copy_page
      0.09 ±  7%      +0.2        0.24 ±  9%      +0.0        0.11 ± 14%  perf-profile.children.cycles-pp.sync_regs
      0.21 ± 48%      +0.2        0.39 ± 65%      +0.0        0.22 ± 28%  perf-profile.children.cycles-pp.update_load_avg
      0.25 ± 39%      +0.2        0.43 ± 61%      +0.0        0.27 ± 25%  perf-profile.children.cycles-pp.update_curr
      0.25 ± 12%      +0.3        0.51 ±  8%      +0.1        0.36 ± 20%  perf-profile.children.cycles-pp.migrate_folio_extra
      0.25 ± 12%      +0.3        0.51 ±  8%      +0.1        0.36 ± 20%  perf-profile.children.cycles-pp.move_to_new_folio
      0.11 ± 20%      +0.3        0.40 ±  7%      +0.0        0.16 ± 15%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.06 ± 40%      +0.4        0.47 ±  9%      +0.1        0.13 ± 23%  perf-profile.children.cycles-pp.__default_send_IPI_dest_field
      0.00            +0.4        0.44 ±  9%      +0.1        0.12 ± 22%  perf-profile.children.cycles-pp.native_flush_tlb_local
      0.68 ± 45%      +0.5        1.16 ± 62%      +0.0        0.71 ± 28%  perf-profile.children.cycles-pp.task_tick_fair
      0.08 ± 16%      +0.5        0.62 ±  9%      +0.1        0.17 ± 21%  perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
      0.96 ± 40%      +0.6        1.57 ± 60%      +0.0        1.00 ± 27%  perf-profile.children.cycles-pp.scheduler_tick
      1.56 ± 32%      +0.7        2.26 ± 55%      +0.1        1.64 ± 25%  perf-profile.children.cycles-pp.update_process_times
      1.58 ± 32%      +0.7        2.29 ± 55%      +0.1        1.65 ± 25%  perf-profile.children.cycles-pp.tick_sched_handle
      1.71 ± 31%      +0.7        2.42 ± 54%      +0.1        1.79 ± 25%  perf-profile.children.cycles-pp.tick_sched_timer
      1.85 ± 30%      +0.7        2.60 ± 52%      +0.1        1.94 ± 25%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      2.09 ± 29%      +0.8        2.86 ± 50%      +0.1        2.18 ± 24%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      2.06 ± 29%      +0.8        2.85 ± 50%      +0.1        2.16 ± 24%  perf-profile.children.cycles-pp.hrtimer_interrupt
      2.48 ± 26%      +0.8        3.28 ± 45%      +0.1        2.60 ± 22%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      2.19 ± 29%      +0.8        2.99 ± 49%      +0.1        2.29 ± 24%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.09 ± 17%      +1.2        1.32 ±  7%      +0.4        0.45 ± 21%  perf-profile.children.cycles-pp.flush_tlb_func
      0.25 ± 14%      +1.6        1.85 ±  9%      +0.3        0.55 ± 18%  perf-profile.children.cycles-pp.llist_reverse_order
     72.83 ±  3%      +1.9       74.77            -0.6       72.25        perf-profile.children.cycles-pp.do_access
      0.40 ± 15%      +2.5        2.86 ±  8%      +0.5        0.93 ± 18%  perf-profile.children.cycles-pp.llist_add_batch
      0.41 ± 14%      +3.3        3.76 ±  8%      +0.7        1.14 ± 19%  perf-profile.children.cycles-pp.__sysvec_call_function
      0.41 ± 14%      +3.4        3.76 ±  8%      +0.7        1.14 ± 19%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.43 ± 14%      +3.5        3.90 ±  8%      +0.7        1.17 ± 19%  perf-profile.children.cycles-pp.sysvec_call_function
      0.55 ± 12%      +4.4        4.95 ±  8%      +0.9        1.40 ± 19%  perf-profile.children.cycles-pp.asm_sysvec_call_function
      3.31 ± 15%      +6.6        9.89 ±  7%      +0.9        4.19 ± 19%  perf-profile.children.cycles-pp.__handle_mm_fault
      3.34 ± 15%      +6.6        9.95 ±  7%      +0.9        4.23 ± 19%  perf-profile.children.cycles-pp.handle_mm_fault
      3.03 ± 15%      +6.7        9.69 ±  7%      +1.0        4.03 ± 19%  perf-profile.children.cycles-pp.do_numa_page
      0.91 ± 15%      +6.7        7.59 ±  7%      +1.5        2.42 ± 18%  perf-profile.children.cycles-pp.smp_call_function_many_cond
      0.91 ± 15%      +6.7        7.59 ±  7%      +1.5        2.42 ± 18%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
      3.70 ± 15%      +6.8       10.49 ±  7%      +0.9        4.64 ± 19%  perf-profile.children.cycles-pp.do_user_addr_fault
      3.70 ± 15%      +6.8       10.50 ±  7%      +0.9        4.64 ± 19%  perf-profile.children.cycles-pp.exc_page_fault
      3.91 ± 14%      +6.8       10.76 ±  7%      +1.0        4.88 ± 19%  perf-profile.children.cycles-pp.asm_exc_page_fault
      2.46 ± 15%      +7.0        9.46 ±  7%      +1.4        3.85 ± 19%  perf-profile.children.cycles-pp.migrate_misplaced_page
      2.27 ± 15%      +7.0        9.28 ±  7%      +1.4        3.67 ± 19%  perf-profile.children.cycles-pp.migrate_pages_batch
      2.27 ± 15%      +7.0        9.29 ±  7%      +1.4        3.68 ± 19%  perf-profile.children.cycles-pp.migrate_pages
      0.00            +7.6        7.57 ±  7%      +2.4        2.40 ± 18%  perf-profile.children.cycles-pp.try_to_unmap_flush
      0.00            +7.6        7.57 ±  7%      +2.4        2.40 ± 18%  perf-profile.children.cycles-pp.arch_tlbbatch_flush
     66.95 ±  3%      -7.7       59.28 ±  2%      -2.0       64.95        perf-profile.self.cycles-pp.do_access
     13.38 ± 11%      -1.4       12.02 ±  4%      +0.3       13.71        perf-profile.self.cycles-pp.nrand48_r
      8.81 ±  9%      -1.1        7.70 ±  3%      +0.1        8.94 ±  2%  perf-profile.self.cycles-pp.lrand48_r
      1.14 ± 16%      -0.9        0.28 ±  9%      -0.9        0.28 ± 21%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      4.08 ±  3%      -0.3        3.77            -0.0        4.03        perf-profile.self.cycles-pp.do_rw_once
      0.06 ±187%      -0.1        0.00            -0.1        0.00        perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      0.29 ±  4%      -0.0        0.26            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.lrand48_r@plt
      0.12 ± 27%      -0.0        0.10 ± 53%      +0.0        0.13 ± 36%  perf-profile.self.cycles-pp.account_user_time
      0.02 ±141%      -0.0        0.00            +0.0        0.02 ±112%  perf-profile.self.cycles-pp.hrtimer_interrupt
      0.07 ± 16%      -0.0        0.07 ± 47%      +0.0        0.08 ± 25%  perf-profile.self.cycles-pp.tick_sched_do_timer
      0.06 ± 55%      -0.0        0.05 ± 46%      +0.0        0.06 ± 42%  perf-profile.self.cycles-pp.ktime_get_update_offsets_now
      0.02 ±111%      -0.0        0.02 ±142%      +0.0        0.03 ±112%  perf-profile.self.cycles-pp.irqtime_account_process_tick
      0.01 ±188%      -0.0        0.01 ±223%      -0.0        0.01 ±282%  perf-profile.self.cycles-pp.rmap_walk_anon
      0.00            +0.0        0.00            +0.0        0.01 ±282%  perf-profile.self.cycles-pp.__free_one_page
      0.06 ± 42%      +0.0        0.07 ± 46%      +0.0        0.07 ± 43%  perf-profile.self.cycles-pp.update_process_times
      0.09 ± 39%      +0.0        0.10 ± 50%      -0.0        0.07 ± 75%  perf-profile.self.cycles-pp.cpuacct_account_field
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.set_tlb_ubc_flush_pending
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.__irq_exit_rcu
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.perf_event_task_tick
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.run_posix_cpu_timers
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.nohz_balance_exit_idle
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.reweight_entity
      0.00            +0.0        0.01 ±223%      +0.0        0.01 ±187%  perf-profile.self.cycles-pp.can_change_pte_writable
      0.06 ± 14%      +0.0        0.07 ± 11%      -0.0        0.04 ± 72%  perf-profile.self.cycles-pp.mt_find
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.trigger_load_balance
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.check_cpu_stall
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.timerqueue_add
      0.00            +0.0        0.01 ±223%      +0.0        0.00        perf-profile.self.cycles-pp.acct_account_cputime
      0.08 ± 17%      +0.0        0.09 ± 13%      +0.0        0.08 ± 21%  perf-profile.self.cycles-pp.page_vma_mapped_walk
      0.11 ± 17%      +0.0        0.13 ± 15%      +0.0        0.12 ± 20%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.01 ±282%      +0.0        0.02 ± 99%      +0.0        0.02 ±112%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.10 ± 16%      +0.0        0.12 ± 65%      +0.0        0.12 ± 29%  perf-profile.self.cycles-pp.__cgroup_account_cputime_field
      0.01 ±282%      +0.0        0.03 ±102%      -0.0        0.00        perf-profile.self.cycles-pp.lapic_next_deadline
      0.01 ±282%      +0.0        0.03 ±150%      +0.0        0.02 ±112%  perf-profile.self.cycles-pp.rcu_pending
      0.02 ±209%      +0.0        0.04 ±103%      -0.0        0.02 ±142%  perf-profile.self.cycles-pp.update_cfs_group
      0.08 ± 47%      +0.0        0.10 ± 68%      -0.0        0.07 ± 45%  perf-profile.self.cycles-pp.hrtimer_active
      0.05 ± 43%      +0.0        0.08 ± 61%      -0.0        0.05 ± 57%  perf-profile.self.cycles-pp.update_irq_load_avg
      0.04 ± 94%      +0.0        0.06 ± 48%      +0.0        0.05 ± 56%  perf-profile.self.cycles-pp.ktime_get
      0.07 ± 16%      +0.0        0.10 ± 10%      +0.0        0.10 ± 21%  perf-profile.self.cycles-pp.__list_del_entry_valid
      0.01 ±282%      +0.0        0.03 ±106%      -0.0        0.00        perf-profile.self.cycles-pp.update_min_vruntime
      0.04 ± 91%      +0.0        0.07 ± 50%      +0.0        0.06 ± 38%  perf-profile.self.cycles-pp.arch_scale_freq_tick
      0.00            +0.0        0.03 ± 70%      +0.0        0.00        perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
      0.01 ±282%      +0.0        0.04 ± 75%      +0.0        0.02 ±112%  perf-profile.self.cycles-pp.__hrtimer_run_queues
      0.06 ± 49%      +0.0        0.10 ± 65%      -0.0        0.06 ± 56%  perf-profile.self.cycles-pp.scheduler_tick
      0.03 ±113%      +0.0        0.07 ± 83%      +0.0        0.04 ± 71%  perf-profile.self.cycles-pp.update_rq_clock
      0.00            +0.0        0.04 ± 44%      +0.0        0.01 ±187%  perf-profile.self.cycles-pp.folio_migrate_flags
      0.09 ± 14%      +0.0        0.14 ± 20%      +0.0        0.10 ± 16%  perf-profile.self.cycles-pp._raw_spin_lock
      0.02 ±191%      +0.0        0.06 ± 86%      +0.0        0.03 ± 90%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.03 ±118%      +0.0        0.08 ± 57%      +0.0        0.05 ± 59%  perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
      0.02 ±111%      +0.1        0.08 ± 10%      +0.0        0.06 ± 15%  perf-profile.self.cycles-pp.change_pte_range
      0.15 ± 14%      +0.1        0.20 ± 10%      +0.0        0.17 ± 21%  perf-profile.self.cycles-pp.up_read
      0.00            +0.1        0.05 ±  8%      +0.0        0.01 ±188%  perf-profile.self.cycles-pp.try_to_migrate_one
      0.19 ± 16%      +0.1        0.24 ± 11%      +0.0        0.20 ± 19%  perf-profile.self.cycles-pp.down_read_trylock
      0.03 ±151%      +0.1        0.09 ± 84%      +0.0        0.04 ± 72%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.00            +0.1        0.07 ±  8%      +0.0        0.00        perf-profile.self.cycles-pp._find_next_bit
      0.00            +0.1        0.07 ±  8%      +0.0        0.00        perf-profile.self.cycles-pp.native_sched_clock
      0.00            +0.1        0.07 ± 12%      +0.0        0.03 ±113%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.09 ± 41%      +0.1        0.16 ± 69%      +0.0        0.09 ± 42%  perf-profile.self.cycles-pp.task_tick_fair
      0.11 ± 49%      +0.1        0.19 ± 74%      -0.0        0.11 ± 29%  perf-profile.self.cycles-pp.update_load_avg
      0.01 ±282%      +0.1        0.11 ±  8%      +0.1        0.06 ± 43%  perf-profile.self.cycles-pp.page_counter_charge
      0.16 ± 15%      +0.1        0.27 ±  9%      +0.1        0.22 ± 21%  perf-profile.self.cycles-pp.copy_page
      0.16 ± 41%      +0.1        0.28 ± 65%      +0.0        0.18 ± 25%  perf-profile.self.cycles-pp.update_curr
      0.09 ±  7%      +0.2        0.24 ±  9%      +0.0        0.11 ± 14%  perf-profile.self.cycles-pp.sync_regs
      0.11 ± 20%      +0.3        0.39 ±  8%      +0.0        0.15 ± 15%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.06 ± 40%      +0.4        0.47 ±  9%      +0.1        0.13 ± 23%  perf-profile.self.cycles-pp.__default_send_IPI_dest_field
      0.00            +0.4        0.44 ± 10%      +0.1        0.11 ± 19%  perf-profile.self.cycles-pp.native_flush_tlb_local
      0.07 ± 15%      +0.5        0.62 ±  7%      +0.1        0.16 ± 18%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
      0.06 ± 16%      +0.8        0.88 ±  7%      +0.3        0.33 ± 21%  perf-profile.self.cycles-pp.flush_tlb_func
      0.25 ± 14%      +1.6        1.85 ±  9%      +0.3        0.55 ± 18%  perf-profile.self.cycles-pp.llist_reverse_order
      0.35 ± 15%      +2.1        2.40 ±  8%      +0.4        0.76 ± 18%  perf-profile.self.cycles-pp.llist_add_batch
      0.37 ± 17%      +3.1        3.49 ±  7%      +0.7        1.10 ± 18%  perf-profile.self.cycles-pp.smp_call_function_many_cond


> Best Regards,
> Huang, Ying
> 
> -------------------------------------8<------------------------------------
> From 1ac61967b54bbdc1ca20af16f9dfb2507a4d4811 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@xxxxxxxxx>
> Date: Mon, 20 Mar 2023 15:48:39 +0800
> Subject: [PATCH] dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible
> 
> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
> ---
>  mm/rmap.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 8632e02661ac..3c7c43642d7c 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>                                  */
>                                 pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>  
> -                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +                               if (pte_accessible(mm, pteval))
> +                                       set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>                         } else {
>                                 pteval = ptep_clear_flush(vma, address, pvmw.pte);
>                         }
> @@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>                                  */
>                                 pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>  
> -                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +                               if (pte_accessible(mm, pteval))
> +                                       set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>                         } else {
>                                 pteval = ptep_clear_flush(vma, address, pvmw.pte);
>                         }





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux