[linus:master] [filemap] 9aac777aaf: phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s -14.0% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a -14.0% regression of phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s on:


commit: 9aac777aaf9459786bc8463e6cbfc7e7e1abd1f9 ("filemap: Convert generic_perform_write() to support large folios")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:

	test: iozone-1.9.6
	option_a: 1MB
	option_b: 512MB
	option_c: Write Performance
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202407242232.9109947e-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240724/202407242232.9109947e-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/1MB/512MB/Write Performance/debian-12-x86_64-phoronix/lkp-csl-2sp7/iozone-1.9.6/phoronix-test-suite

commit: 
  146a99aefe ("xprtrdma: removed asm-generic headers from verbs.c")
  9aac777aaf ("filemap: Convert generic_perform_write() to support large folios")

146a99aefe4a45f6 9aac777aaf9459786bc8463e6cb 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      3043           -14.0%       2618        phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s
      6003 ±  6%     +21.0%       7262 ± 21%  proc-vmstat.nr_active_anon
      6003 ±  6%     +21.0%       7262 ± 21%  proc-vmstat.nr_zone_active_anon
      0.62 ± 43%     +90.5%       1.19 ± 43%  sched_debug.cfs_rq:/system.slice/containerd.service.load_avg.avg
      0.62 ± 43%     +94.9%       1.21 ± 40%  sched_debug.cfs_rq:/system.slice/containerd.service.runnable_avg.avg
      0.59 ± 36%     +99.4%       1.19 ± 41%  sched_debug.cfs_rq:/system.slice/containerd.service.se->avg.runnable_avg.avg
      0.59 ± 36%     +99.4%       1.19 ± 41%  sched_debug.cfs_rq:/system.slice/containerd.service.se->avg.util_avg.avg
      0.62 ± 43%     +85.1%       1.15 ± 39%  sched_debug.cfs_rq:/system.slice/containerd.service.tg_load_avg_contrib.avg
      0.62 ± 43%     +94.9%       1.21 ± 40%  sched_debug.cfs_rq:/system.slice/containerd.service.util_avg.avg
     60.61            -2.1       58.48        perf-stat.i.iTLB-load-miss-rate%
    910966            -3.4%     879846        perf-stat.i.iTLB-load-misses
      5100 ±  2%      +4.8%       5346 ±  2%  perf-stat.i.instructions-per-iTLB-miss
     57.76 ±  2%      +3.0       60.79 ±  3%  perf-stat.i.node-load-miss-rate%
     38.99 ±  2%      +3.9       42.85 ±  4%  perf-stat.i.node-store-miss-rate%
     61.51            -2.1       59.37        perf-stat.overall.iTLB-load-miss-rate%
      4574            +3.3%       4727        perf-stat.overall.instructions-per-iTLB-miss
    885569            -3.3%     856059        perf-stat.ps.iTLB-load-misses
      0.02 ± 58%     -72.5%       0.01 ±119%  perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
      0.00 ±103%   +1162.5%       0.02 ±112%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm
      0.03 ± 75%     -87.1%       0.00 ±106%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.fsnotify_destroy_group
      0.10 ± 27%     -64.5%       0.03 ±105%  perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
      0.06 ±  4%     +89.3%       0.11 ± 27%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
      0.00 ±103%   +1487.5%       0.02 ±111%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm
      0.04 ± 79%     -90.8%       0.00 ±104%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.fsnotify_destroy_group
      3.89 ± 36%     -31.4%       2.66 ±  8%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
      1097 ± 14%     +28.8%       1413 ±  6%  perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
      0.02 ± 18%     -56.5%       0.01 ± 52%  perf-sched.wait_time.avg.ms.__cond_resched.mmput.do_task_stat.proc_single_show.seq_read_iter
      3.87 ± 37%     -31.6%       2.65 ±  8%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part
    425.39           +13.0%     480.82        perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
     15.00 ± 80%      -6.8        8.16 ±147%  perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.00 ± 80%      -6.8        8.16 ±147%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode
     15.00 ± 80%      -6.8        8.16 ±147%  perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
     15.00 ± 80%      -6.8        8.16 ±147%  perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.00 ± 80%      -6.8        8.16 ±147%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.09 ±102%      -3.4        0.72 ±223%  perf-profile.calltrace.cycles-pp._compound_head.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
      5.98 ± 87%      -3.0        2.96 ±176%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
     15.00 ± 80%      -6.8        8.16 ±147%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      5.27 ± 61%      -4.1        1.15 ±223%  perf-profile.children.cycles-pp.sched_balance_newidle
      5.27 ± 61%      -4.1        1.15 ±223%  perf-profile.children.cycles-pp.sched_balance_rq
      4.09 ±102%      -3.4        0.72 ±223%  perf-profile.children.cycles-pp._compound_head
      5.98 ± 87%      -3.0        2.96 ±176%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      4.09 ±102%      -3.4        0.72 ±223%  perf-profile.self.cycles-pp._compound_head




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux