[jlayton:delstid] [nfsd] 48ff66155e: fsmark.app_overhead 56.0% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 56.0% regression of fsmark.app_overhead on:
(but no obvious diff for fsmark.files_per_sec)

commit: 48ff66155e25a3db70cb44b340e1e990349825b8 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git delstid

testcase: fsmark
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

	iterations: 1x
	nr_threads: 1t
	disk: 1HDD
	fs: btrfs
	fs2: nfsv4
	filesize: 4K
	test_size: 40M
	sync_method: fsyncBeforeClose
	nr_files_per_directory: 1fpd
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | fsmark: fsmark.app_overhead 35.9% regression                                                   |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1SSD                                                                                      |
|                  | filesize=16MB                                                                                  |
|                  | fs2=nfsv4                                                                                      |
|                  | fs=ext4                                                                                        |
|                  | iterations=1x                                                                                  |
|                  | nr_directories=16d                                                                             |
|                  | nr_files_per_directory=256fpd                                                                  |
|                  | nr_threads=32t                                                                                 |
|                  | sync_method=fsyncBeforeClose                                                                   |
|                  | test_size=20G                                                                                  |
+------------------+------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202412242245.dbf6ea4b-lkp@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241224/202412242245.dbf6ea4b-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-9.4/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark

commit: 
  1436c81cbe ("nfsd: handle delegated timestamps in SETATTR")
  48ff66155e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")

1436c81cbe9bef84 48ff66155e25a3db70cb44b340e 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2837 ± 10%     +21.1%       3436 ±  9%  numa-meminfo.node1.PageTables
    709.56 ± 10%     +21.1%     859.52 ±  9%  numa-vmstat.node1.nr_page_table_pages
   1551575           +17.7%    1826031 ± 28%  proc-vmstat.pgfault
     33.29            +2.9%      34.26        boot-time.boot
     24.88            +2.0%      25.38        boot-time.dhcp
   3081774           +56.0%    4807945        fsmark.app_overhead
     18.53            +0.2%      18.57        fsmark.files_per_sec
     53434           -17.3%      44176        fsmark.time.voluntary_context_switches
      2827           -11.7%       2496 ± 10%  perf-stat.i.context-switches
      2822           -11.7%       2492 ± 10%  perf-stat.ps.context-switches
      2882           -12.0%       2537 ± 11%  vmstat.system.cs
      4080           -10.7%       3643 ± 13%  vmstat.system.in
      5.82 ±  4%      -0.5        5.30 ±  5%  perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      6.83 ±  4%      -0.5        6.33 ±  4%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      1.67 ± 10%      -0.4        1.28 ± 11%  perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread
      1.67 ± 10%      -0.4        1.28 ± 11%  perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork
      0.64 ±  4%      -0.4        0.28 ±100%  perf-profile.calltrace.cycles-pp.sched_balance_find_src_group.sched_balance_rq.sched_balance_domains.handle_softirqs.__irq_exit_rcu
      1.16 ±  8%      -0.3        0.88 ± 18%  perf-profile.calltrace.cycles-pp.sched_balance_domains.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.95 ±  9%      -0.2        0.70 ± 20%  perf-profile.calltrace.cycles-pp.sched_balance_rq.sched_balance_domains.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
      0.96 ±  9%      +0.1        1.09 ±  9%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.94 ±  9%      +0.2        1.09 ±  9%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.79 ±  8%      +0.2        0.96 ± 13%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__vfork
      0.79 ±  8%      +0.2        0.96 ± 13%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__vfork
      0.78 ±  6%      +0.2        0.95 ± 13%  perf-profile.calltrace.cycles-pp.__x64_sys_vfork.do_syscall_64.entry_SYSCALL_64_after_hwframe.__vfork
      1.92 ± 24%      -0.7        1.20 ± 32%  perf-profile.children.cycles-pp.do_softirq
      7.32 ±  5%      -0.5        6.82 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      1.85 ± 14%      -0.5        1.35 ± 18%  perf-profile.children.cycles-pp.ip6_xmit
      1.92 ± 12%      -0.5        1.44 ± 14%  perf-profile.children.cycles-pp.inet6_csk_xmit
      1.75 ± 14%      -0.5        1.28 ± 20%  perf-profile.children.cycles-pp.ip6_finish_output2
      2.64 ±  4%      -0.5        2.18        perf-profile.children.cycles-pp.__rpc_execute
      1.45 ± 12%      -0.4        1.00 ± 19%  perf-profile.children.cycles-pp.__local_bh_enable_ip
      1.58 ± 14%      -0.4        1.16 ±  4%  perf-profile.children.cycles-pp.xprt_request_transmit
      1.59 ± 14%      -0.4        1.16 ±  4%  perf-profile.children.cycles-pp.xprt_transmit
      1.30 ± 15%      -0.4        0.88 ± 16%  perf-profile.children.cycles-pp.__netif_receive_skb_one_core
      1.33 ± 14%      -0.4        0.92 ± 13%  perf-profile.children.cycles-pp.__napi_poll
      1.50 ± 11%      -0.4        1.09 ± 14%  perf-profile.children.cycles-pp.net_rx_action
      1.32 ± 14%      -0.4        0.92 ± 13%  perf-profile.children.cycles-pp.process_backlog
      1.19 ± 14%      -0.4        0.79 ± 13%  perf-profile.children.cycles-pp.ip6_input_finish
      1.18 ± 14%      -0.4        0.78 ± 13%  perf-profile.children.cycles-pp.ip6_protocol_deliver_rcu
      1.59 ± 13%      -0.4        1.20 ±  5%  perf-profile.children.cycles-pp.call_transmit
      1.54 ± 14%      -0.4        1.15 ±  4%  perf-profile.children.cycles-pp.xs_tcp_send_request
      1.67 ± 10%      -0.4        1.28 ± 11%  perf-profile.children.cycles-pp.rpc_async_schedule
      1.10 ± 15%      -0.4        0.74 ± 14%  perf-profile.children.cycles-pp.tcp_v6_rcv
      0.86 ± 22%      -0.3        0.54 ± 12%  perf-profile.children.cycles-pp.tcp_v6_do_rcv
      0.81 ± 26%      -0.3        0.52 ± 10%  perf-profile.children.cycles-pp.tcp_rcv_established
      0.42 ± 19%      -0.2        0.24 ± 17%  perf-profile.children.cycles-pp.___perf_sw_event
      0.20 ± 35%      -0.2        0.03 ±100%  perf-profile.children.cycles-pp.__put_nfs_open_context
      0.34 ± 12%      -0.1        0.23 ± 22%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.35 ± 10%      -0.1        0.24 ± 18%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
      0.05 ± 76%      +0.1        0.13        perf-profile.children.cycles-pp.ct_nmi_enter
      0.12 ± 36%      +0.1        0.21 ± 28%  perf-profile.children.cycles-pp.autoremove_wake_function
      0.78 ±  6%      +0.2        0.95 ± 13%  perf-profile.children.cycles-pp.__x64_sys_vfork
      1.08 ± 16%      +0.2        1.32 ± 10%  perf-profile.children.cycles-pp.tick_nohz_stop_tick
      1.09 ± 15%      +0.2        1.34 ± 10%  perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick
      1.26 ± 12%      +0.3        1.53 ±  7%  perf-profile.children.cycles-pp.kernel_clone
      2.33 ±  8%      +0.3        2.66 ±  7%  perf-profile.children.cycles-pp.do_filp_open
      2.30 ±  7%      +0.3        2.64 ±  7%  perf-profile.children.cycles-pp.path_openat
      0.30 ± 28%      -0.1        0.16 ± 46%  perf-profile.self.cycles-pp.tick_nohz_handler
      0.12 ± 32%      +0.1        0.20 ± 26%  perf-profile.self.cycles-pp.__memcg_slab_free_hook
      0.05 ± 76%      +0.1        0.13        perf-profile.self.cycles-pp.ct_nmi_enter
      0.36 ± 33%      +0.2        0.54 ± 27%  perf-profile.self.cycles-pp.next_uptodate_folio


***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/16MB/nfsv4/ext4/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/20G/fsmark

commit: 
  1436c81cbe ("nfsd: handle delegated timestamps in SETATTR")
  48ff66155e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")

1436c81cbe9bef84 48ff66155e25a3db70cb44b340e 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 2.472e+08 ±  4%     +35.9%  3.361e+08 ±  4%  fsmark.app_overhead
     16.00            +0.0%      16.00        fsmark.files_per_sec
     65306 ±121%    +168.8%     175532 ±  4%  numa-meminfo.node1.AnonHugePages
    449.72 ±  8%     +24.4%     559.64 ± 16%  sched_debug.cpu.curr->pid.avg
      0.02 ± 37%     -75.3%       0.00 ±101%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.61 ± 47%     -69.2%       0.19 ± 52%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.57 ± 35%      -1.6        1.98 ± 24%  perf-profile.calltrace.cycles-pp.read
      3.35 ± 38%      -1.4        1.98 ± 24%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      3.35 ± 38%      -1.4        1.98 ± 24%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      1.10 ± 72%      +2.0        3.11 ± 14%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      1.37 ± 83%      +2.3        3.66 ± 38%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.57 ± 35%      -1.6        1.98 ± 24%  perf-profile.children.cycles-pp.read
      1.90 ± 60%      +2.4        4.28 ± 15%  perf-profile.children.cycles-pp.__mmap_region
      1.37 ± 83%      +2.5        3.92 ± 35%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
      2.17 ± 57%      +2.7        4.84 ± 34%  perf-profile.children.cycles-pp.do_mmap
      2.17 ± 57%      +2.9        5.08 ± 33%  perf-profile.children.cycles-pp.vm_mmap_pgoff





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux