Re: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 10.2% regression of fsmark.files_per_sec on:


commit: d7f6562adeebe62458eb11437b260d3f470849cd ("[PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/SUNRPC-move-nrthreads-counting-to-start-stop-threads/20241023-104539
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link: https://lore.kernel.org/all/20241023024222.691745-7-neilb@xxxxxxx/
patch subject: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual

testcase: fsmark
config: x86_64-rhel-8.3
compiler: gcc-12
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:

	iterations: 1x
	nr_threads: 32t
	disk: 1SSD
	fs: btrfs
	fs2: nfsv4
	filesize: 8K
	test_size: 400M
	sync_method: fsyncBeforeClose
	nr_directories: 16d
	nr_files_per_directory: 256fpd
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | fsmark: fsmark.files_per_sec 10.2% regression                                                  |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1SSD                                                                                      |
|                  | filesize=9B                                                                                    |
|                  | fs2=nfsv4                                                                                      |
|                  | fs=btrfs                                                                                       |
|                  | iterations=1x                                                                                  |
|                  | nr_directories=16d                                                                             |
|                  | nr_files_per_directory=256fpd                                                                  |
|                  | nr_threads=32t                                                                                 |
|                  | sync_method=fsyncBeforeClose                                                                   |
|                  | test_size=400M                                                                                 |
+------------------+------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202410301321.d8aebe67-oliver.sang@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241030/202410301321.d8aebe67-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/8K/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/400M/fsmark

commit: 
  4e9c43765c ("sunrpc: remove all connection limit configuration")
  d7f6562ade ("sunrpc: introduce possibility that requested number of threads is different from actual")

4e9c43765c3fd361 d7f6562adeebe62458eb11437b2 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.09 ± 78%    +173.9%       0.24 ± 29%  perf-stat.i.major-faults
      0.07 ± 66%     -61.1%       0.03 ± 23%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.02 ± 88%   +1806.2%       0.46 ±196%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    -17.50           -31.4%     -12.00        sched_debug.cpu.nr_uninterruptible.min
      5.32 ± 10%     -11.9%       4.69 ±  8%  sched_debug.cpu.nr_uninterruptible.stddev
  76584407 ±  5%     +40.6%  1.077e+08 ±  4%  fsmark.app_overhead
      3016 ±  2%     -10.2%       2707        fsmark.files_per_sec
     36.83            -7.7%      34.00        fsmark.time.percent_of_cpu_this_job_got
    285398 ±  4%     +10.5%     315320        meminfo.Active
    282671 ±  4%     +10.6%     312596        meminfo.Active(file)
     15709 ± 11%     -28.8%      11189 ±  2%  meminfo.Dirty
     70764 ±  4%     +10.6%      78241        proc-vmstat.nr_active_file
    468998            +5.4%     494463        proc-vmstat.nr_dirtied
      3931 ± 11%     -28.8%       2797 ±  2%  proc-vmstat.nr_dirty
    462247            +6.7%     493402        proc-vmstat.nr_written
     70764 ±  4%     +10.6%      78241        proc-vmstat.nr_zone_active_file
      3620 ± 10%     -30.7%       2507 ±  5%  proc-vmstat.nr_zone_write_pending
   1135721            +2.6%    1165013        proc-vmstat.numa_hit
   1086049            +2.7%    1115353        proc-vmstat.numa_local
     56063 ±  2%     +18.0%      66177        proc-vmstat.pgactivate
   1521044            +2.3%    1555275        proc-vmstat.pgalloc_normal
   2522626           +10.2%    2778923        proc-vmstat.pgpgout
      3.75 ± 26%      -1.9        1.80 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.75 ± 26%      -1.9        1.80 ±  7%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.54 ± 39%      -1.7        0.89 ±100%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
      2.54 ± 39%      -1.7        0.89 ±100%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.51 ±141%      +1.6        2.09 ± 29%  perf-profile.calltrace.cycles-pp.evlist_cpu_iterator__next.__evlist__disable.__cmd_record.cmd_record.run_builtin
      0.81 ±100%      +2.2        3.01 ± 44%  perf-profile.calltrace.cycles-pp.__evlist__disable.__cmd_record.cmd_record.run_builtin.handle_internal_command
      3.11 ± 27%      +2.6        5.74 ± 24%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
      3.11 ± 27%      +2.6        5.74 ± 24%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.generic_perform_write
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.ksys_write
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.vfs_write
      0.81 ±100%      +2.2        3.01 ± 44%  perf-profile.children.cycles-pp.__evlist__disable


***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/9B/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/400M/fsmark

commit: 
  4e9c43765c ("sunrpc: remove all connection limit configuration")
  d7f6562ade ("sunrpc: introduce possibility that requested number of threads is different from actual")

4e9c43765c3fd361 d7f6562adeebe62458eb11437b2 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.35            -2.0%       2.30        iostat.cpu.user
    137839 ± 27%     +41.1%     194529 ± 17%  numa-meminfo.node0.Active
    136284 ± 27%     +41.7%     193101 ± 17%  numa-meminfo.node0.Active(file)
      2.80 ± 32%      -1.6        1.16 ± 46%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      2.80 ± 32%      -1.6        1.16 ± 46%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
 1.397e+08 ±  2%     +14.5%  1.599e+08        fsmark.app_overhead
      4135           -10.2%       3712        fsmark.files_per_sec
     48.67            -8.2%      44.67        fsmark.time.percent_of_cpu_this_job_got
     34164 ± 27%     +41.6%      48381 ± 17%  numa-vmstat.node0.nr_active_file
    276516 ± 17%     +26.7%     350296 ± 10%  numa-vmstat.node0.nr_dirtied
    276215 ± 17%     +26.5%     349400 ± 10%  numa-vmstat.node0.nr_written
     34164 ± 27%     +41.6%      48381 ± 17%  numa-vmstat.node0.nr_zone_active_file
     80.71 ± 30%     +38.5%     111.75 ± 20%  sched_debug.cfs_rq:/.removed.load_avg.avg
    270.60 ± 12%     +16.0%     313.94 ±  9%  sched_debug.cfs_rq:/.removed.load_avg.stddev
     23.59 ± 44%     +49.9%      35.35 ± 24%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     23.58 ± 44%     +49.8%      35.33 ± 24%  sched_debug.cfs_rq:/.removed.util_avg.avg
      3281 ± 11%   +1047.0%      37635 ±193%  sched_debug.cpu.avg_idle.min
    598666            +7.0%     640842        proc-vmstat.nr_dirtied
   1074454            +1.1%    1086656        proc-vmstat.nr_file_pages
    197321            +3.7%     204605        proc-vmstat.nr_inactive_file
    598082            +6.8%     638786        proc-vmstat.nr_written
    197321            +3.7%     204605        proc-vmstat.nr_zone_inactive_file
   1716397            +2.3%    1755672        proc-vmstat.numa_hit
   1666723            +2.4%    1705951        proc-vmstat.numa_local
    145474            +7.4%     156311        proc-vmstat.pgactivate
   2183965            +2.0%    2226921        proc-vmstat.pgalloc_normal
   3250818           +10.5%    3591588        proc-vmstat.pgpgout
      0.00 ±223%    +633.3%       0.02 ± 53%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ±223%    +729.3%       0.06 ± 29%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±223%    +466.7%       0.03 ± 22%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
      0.00 ±223%    +973.3%       0.03 ± 61%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ±223%    +480.6%       0.09 ± 25%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.67 ±223%    +506.5%       4.04        perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      3.74 ± 52%    +190.1%      10.85 ± 36%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.68 ±223%    +526.0%       4.25 ±  8%  perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     18.67 ±173%    +375.2%      88.72 ± 23%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ±143%    +212.5%       0.03 ± 47%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.66 ±223%    +504.3%       3.99        perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      3.69 ± 53%    +192.6%      10.79 ± 36%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.67 ±223%    +525.6%       4.17 ±  8%  perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     18.65 ±173%    +375.4%      88.66 ± 23%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux