[linus:master] [madvise] 2f406263e3: stress-ng.mremap.ops_per_sec 6.7% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 6.7% regression of stress-ng.mremap.ops_per_sec on:


commit: 2f406263e3e954aa24c1248edcfa9be0c1bb30fa ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on fix commit cc864ebba5f612ce2960e7e09322a193e8fda0d7]

testcase: stress-ng
config: x86_64-rhel-8.3
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: mremap
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202411291513.ad55672a-lkp@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241129/202411291513.ad55672a-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mremap/stress-ng/60s

commit: 
  6867c7a332 ("mm: multi-gen LRU: don't spin during memcg release")
  2f406263e3 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")

6867c7a3320669cb 2f406263e3e954aa24c1248edcf 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     36.80 ±  7%      +4.1       40.91        mpstat.cpu.all.sys%
    325.67 ± 44%    +119.1%     713.67 ± 13%  perf-c2c.HITM.local
     63.83 ± 67%    +175.7%     176.00 ± 20%  perf-c2c.HITM.remote
      9.59 ± 19%     -36.7%       6.07 ± 31%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.02 ±  9%     +48.0%       0.03 ± 30%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ±  3%     +73.9%       0.03 ± 20%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    936.50 ± 27%     +49.9%       1403 ±  9%  perf-sched.wait_and_delay.count.__cond_resched.shrink_folio_list.reclaim_folio_list.reclaim_pages.madvise_cold_or_pageout_pte_range
    374720 ±  2%      -6.7%     349433        stress-ng.mremap.ops
      6245 ±  2%      -6.7%       5823        stress-ng.mremap.ops_per_sec
 2.353e+08 ±  2%      -6.8%  2.194e+08        stress-ng.time.minor_page_faults
      2213 ±  4%      -7.0%       2057        stress-ng.time.user_time
  2.22e+08 ±  2%      -6.8%  2.069e+08        proc-vmstat.numa_hit
 2.219e+08 ±  2%      -6.8%  2.067e+08        proc-vmstat.numa_local
 4.117e+08 ±  2%      -6.7%  3.842e+08        proc-vmstat.pgalloc_normal
 2.357e+08 ±  2%      -6.7%  2.198e+08        proc-vmstat.pgfault
 4.115e+08 ±  2%      -6.7%   3.84e+08        proc-vmstat.pgfree
    350460 ±  2%      -6.8%     326755        proc-vmstat.thp_deferred_split_page
    374783 ±  2%      -6.7%     349496        proc-vmstat.thp_fault_alloc
     24286 ±  2%    +278.1%      91836 ± 39%  proc-vmstat.thp_split_page
    374810 ±  2%      -6.7%     349527        proc-vmstat.thp_split_pmd
     24286 ±  2%      -6.5%      22708        proc-vmstat.thp_swpout_fallback
  1.69e+09 ±  2%      -6.1%  1.587e+09        perf-stat.i.cache-references
      4.37            +1.7%       4.44        perf-stat.i.cpi
    203.06 ±  3%     -12.2%     178.34 ±  4%  perf-stat.i.cpu-migrations
 4.438e+10            -1.5%  4.372e+10        perf-stat.i.instructions
      0.23            -1.6%       0.23        perf-stat.i.ipc
      4.38            +1.7%       4.46        perf-stat.overall.cpi
    171.29 ±  4%      +5.7%     180.97        perf-stat.overall.cycles-between-cache-misses
      0.23            -1.7%       0.22        perf-stat.overall.ipc
 1.664e+09 ±  2%      -6.2%  1.562e+09        perf-stat.ps.cache-references
    199.46 ±  3%     -12.3%     174.85 ±  5%  perf-stat.ps.cpu-migrations
 4.368e+10            -1.5%    4.3e+10        perf-stat.ps.instructions
 2.688e+12            -1.9%  2.637e+12        perf-stat.total.instructions
      7.77 ±  2%      -0.3        7.46 ±  3%  perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
      7.63 ±  2%      -0.3        7.32 ±  3%  perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.__get_user_pages
      7.26 ±  2%      -0.3        6.98 ±  3%  perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.26 ±100%      +0.7        0.92 ± 20%  perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
      0.00            +0.8        0.78 ± 22%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
      0.00            +0.8        0.81 ± 22%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
      0.00            +0.8        0.82 ± 21%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
      7.70 ±  2%      -0.3        7.38 ±  3%  perf-profile.children.cycles-pp.clear_huge_page
      7.77 ±  2%      -0.3        7.46 ±  3%  perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
      0.10 ±  4%      -0.0        0.08        perf-profile.children.cycles-pp.__call_rcu_common
      0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.vm_normal_page
      0.24 ±  9%      +0.1        0.29 ±  5%  perf-profile.children.cycles-pp.folio_add_lru
      0.16 ±  3%      +0.1        0.22 ± 13%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.06 ± 17%      +0.1        0.18 ± 35%  perf-profile.children.cycles-pp.__free_one_page
      0.07 ± 10%      +0.1        0.19 ± 33%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.77 ±  6%      +0.1        0.89 ± 11%  perf-profile.children.cycles-pp._raw_spin_lock
      0.26 ±  5%      +0.1        0.40 ± 13%  perf-profile.children.cycles-pp.free_unref_page_list
      0.08 ±  8%      +0.1        0.23 ± 30%  perf-profile.children.cycles-pp.uncharge_batch
      0.34 ± 20%      +0.1        0.49 ± 19%  perf-profile.children.cycles-pp.get_swap_pages
      0.08 ± 11%      +0.2        0.24 ± 40%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.00            +0.2        0.16 ± 50%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
      0.43 ±  7%      +0.4        0.82 ± 21%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      0.42 ±  7%      +0.4        0.82 ± 22%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.49 ±  6%      +0.4        0.92 ± 20%  perf-profile.children.cycles-pp.folio_isolate_lru
      0.11 ±  7%      +0.6        0.73 ± 52%  perf-profile.children.cycles-pp.madvise_cold
      0.00            +0.8        0.76 ± 56%  perf-profile.children.cycles-pp.__page_cache_release
      0.00            +0.9        0.89 ± 57%  perf-profile.children.cycles-pp.__folio_put
      1.23 ± 10%      +1.0        2.22 ± 26%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.11 ± 12%      +1.1        1.17 ± 53%  perf-profile.children.cycles-pp.__split_huge_page
      0.12 ± 11%      +1.2        1.31 ± 56%  perf-profile.children.cycles-pp.split_huge_page_to_list
      0.11 ±  4%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.do_vmi_align_munmap
      0.12 ±  3%      +0.0        0.14 ±  5%  perf-profile.self.cycles-pp.madvise_cold_or_pageout_pte_range
      0.15 ±  5%      +0.1        0.21 ± 12%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.39 ± 15%      +0.1        0.49 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.06 ± 13%      +0.1        0.18 ± 34%  perf-profile.self.cycles-pp.__free_one_page
      0.06 ± 11%      +0.1        0.18 ± 33%  perf-profile.self.cycles-pp.page_counter_uncharge




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux