Hello, kernel test robot noticed a 56.0% regression of fsmark.app_overhead on: (but no obvious diff for fsmark.files_per_sec) commit: 48ff66155e25a3db70cb44b340e1e990349825b8 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git delstid testcase: fsmark config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: iterations: 1x nr_threads: 1t disk: 1HDD fs: btrfs fs2: nfsv4 filesize: 4K test_size: 40M sync_method: fsyncBeforeClose nr_files_per_directory: 1fpd cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+------------------------------------------------------------------------------------------------+ | testcase: change | fsmark: fsmark.app_overhead 35.9% regression | | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory | | test parameters | cpufreq_governor=performance | | | disk=1SSD | | | filesize=16MB | | | fs2=nfsv4 | | | fs=ext4 | | | iterations=1x | | | nr_directories=16d | | | nr_files_per_directory=256fpd | | | nr_threads=32t | | | sync_method=fsyncBeforeClose | | | test_size=20G | +------------------+------------------------------------------------------------------------------------------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202412242245.dbf6ea4b-lkp@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20241224/202412242245.dbf6ea4b-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-9.4/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark commit: 1436c81cbe ("nfsd: handle delegated timestamps in SETATTR") 48ff66155e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") 1436c81cbe9bef84 48ff66155e25a3db70cb44b340e ---------------- --------------------------- %stddev %change %stddev \ | \ 2837 ± 10% +21.1% 3436 ± 9% numa-meminfo.node1.PageTables 709.56 ± 10% +21.1% 859.52 ± 9% numa-vmstat.node1.nr_page_table_pages 1551575 +17.7% 1826031 ± 28% proc-vmstat.pgfault 33.29 +2.9% 34.26 boot-time.boot 24.88 +2.0% 25.38 boot-time.dhcp 3081774 +56.0% 4807945 fsmark.app_overhead 18.53 +0.2% 18.57 fsmark.files_per_sec 53434 -17.3% 44176 fsmark.time.voluntary_context_switches 2827 -11.7% 2496 ± 10% perf-stat.i.context-switches 2822 -11.7% 2492 ± 10% perf-stat.ps.context-switches 2882 -12.0% 2537 ± 11% vmstat.system.cs 4080 -10.7% 3643 ± 13% vmstat.system.in 5.82 ± 4% -0.5 5.30 ± 5% perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt 6.83 ± 4% -0.5 6.33 ± 4% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 1.67 ± 10% -0.4 1.28 ± 11% perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 1.67 ± 10% -0.4 1.28 ± 11% perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 0.64 ± 4% -0.4 0.28 ±100% perf-profile.calltrace.cycles-pp.sched_balance_find_src_group.sched_balance_rq.sched_balance_domains.handle_softirqs.__irq_exit_rcu 1.16 ± 8% -0.3 0.88 ± 18% perf-profile.calltrace.cycles-pp.sched_balance_domains.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.95 ± 9% -0.2 0.70 ± 20% perf-profile.calltrace.cycles-pp.sched_balance_rq.sched_balance_domains.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt 0.96 ± 9% +0.1 1.09 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.94 ± 9% +0.2 1.09 ± 9% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.79 ± 8% +0.2 0.96 ± 13% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__vfork 0.79 ± 8% +0.2 0.96 ± 13% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__vfork 0.78 ± 6% +0.2 0.95 ± 13% perf-profile.calltrace.cycles-pp.__x64_sys_vfork.do_syscall_64.entry_SYSCALL_64_after_hwframe.__vfork 1.92 ± 24% -0.7 1.20 ± 32% perf-profile.children.cycles-pp.do_softirq 7.32 ± 5% -0.5 6.82 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues 1.85 ± 14% -0.5 1.35 ± 18% perf-profile.children.cycles-pp.ip6_xmit 1.92 ± 12% -0.5 1.44 ± 14% perf-profile.children.cycles-pp.inet6_csk_xmit 1.75 ± 14% -0.5 1.28 ± 20% perf-profile.children.cycles-pp.ip6_finish_output2 2.64 ± 4% -0.5 2.18 perf-profile.children.cycles-pp.__rpc_execute 1.45 ± 12% -0.4 1.00 ± 19% perf-profile.children.cycles-pp.__local_bh_enable_ip 1.58 ± 14% -0.4 1.16 ± 4% perf-profile.children.cycles-pp.xprt_request_transmit 1.59 ± 14% -0.4 1.16 ± 4% perf-profile.children.cycles-pp.xprt_transmit 1.30 ± 15% -0.4 0.88 ± 16% perf-profile.children.cycles-pp.__netif_receive_skb_one_core 1.33 ± 14% -0.4 0.92 ± 13% perf-profile.children.cycles-pp.__napi_poll 1.50 ± 11% -0.4 1.09 ± 14% perf-profile.children.cycles-pp.net_rx_action 1.32 ± 14% -0.4 0.92 ± 13% perf-profile.children.cycles-pp.process_backlog 1.19 ± 14% -0.4 0.79 ± 13% perf-profile.children.cycles-pp.ip6_input_finish 1.18 ± 14% -0.4 0.78 ± 13% perf-profile.children.cycles-pp.ip6_protocol_deliver_rcu 1.59 ± 13% -0.4 1.20 ± 5% perf-profile.children.cycles-pp.call_transmit 1.54 ± 14% -0.4 1.15 ± 4% perf-profile.children.cycles-pp.xs_tcp_send_request 1.67 ± 10% -0.4 1.28 ± 11% perf-profile.children.cycles-pp.rpc_async_schedule 1.10 ± 15% -0.4 0.74 ± 14% perf-profile.children.cycles-pp.tcp_v6_rcv 0.86 ± 22% -0.3 0.54 ± 12% perf-profile.children.cycles-pp.tcp_v6_do_rcv 0.81 ± 26% -0.3 0.52 ± 10% perf-profile.children.cycles-pp.tcp_rcv_established 0.42 ± 19% -0.2 0.24 ± 17% perf-profile.children.cycles-pp.___perf_sw_event 0.20 ± 35% -0.2 0.03 ±100% perf-profile.children.cycles-pp.__put_nfs_open_context 0.34 ± 12% -0.1 0.23 ± 22% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.35 ± 10% -0.1 0.24 ± 18% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 0.05 ± 76% +0.1 0.13 perf-profile.children.cycles-pp.ct_nmi_enter 0.12 ± 36% +0.1 0.21 ± 28% perf-profile.children.cycles-pp.autoremove_wake_function 0.78 ± 6% +0.2 0.95 ± 13% perf-profile.children.cycles-pp.__x64_sys_vfork 1.08 ± 16% +0.2 1.32 ± 10% perf-profile.children.cycles-pp.tick_nohz_stop_tick 1.09 ± 15% +0.2 1.34 ± 10% perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick 1.26 ± 12% +0.3 1.53 ± 7% perf-profile.children.cycles-pp.kernel_clone 2.33 ± 8% +0.3 2.66 ± 7% perf-profile.children.cycles-pp.do_filp_open 2.30 ± 7% +0.3 2.64 ± 7% perf-profile.children.cycles-pp.path_openat 0.30 ± 28% -0.1 0.16 ± 46% perf-profile.self.cycles-pp.tick_nohz_handler 0.12 ± 32% +0.1 0.20 ± 26% perf-profile.self.cycles-pp.__memcg_slab_free_hook 0.05 ± 76% +0.1 0.13 perf-profile.self.cycles-pp.ct_nmi_enter 0.36 ± 33% +0.2 0.54 ± 27% perf-profile.self.cycles-pp.next_uptodate_folio *************************************************************************************************** lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory ========================================================================================= compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-12/performance/1SSD/16MB/nfsv4/ext4/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/20G/fsmark commit: 1436c81cbe ("nfsd: handle delegated timestamps in SETATTR") 48ff66155e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") 1436c81cbe9bef84 48ff66155e25a3db70cb44b340e ---------------- --------------------------- %stddev %change %stddev \ | \ 2.472e+08 ± 4% +35.9% 3.361e+08 ± 4% fsmark.app_overhead 16.00 +0.0% 16.00 fsmark.files_per_sec 65306 ±121% +168.8% 175532 ± 4% numa-meminfo.node1.AnonHugePages 449.72 ± 8% +24.4% 559.64 ± 16% sched_debug.cpu.curr->pid.avg 0.02 ± 37% -75.3% 0.00 ±101% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.61 ± 47% -69.2% 0.19 ± 52% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 3.57 ± 35% -1.6 1.98 ± 24% perf-profile.calltrace.cycles-pp.read 3.35 ± 38% -1.4 1.98 ± 24% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 3.35 ± 38% -1.4 1.98 ± 24% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read 1.10 ± 72% +2.0 3.11 ± 14% perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 1.37 ± 83% +2.3 3.66 ± 38% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.57 ± 35% -1.6 1.98 ± 24% perf-profile.children.cycles-pp.read 1.90 ± 60% +2.4 4.28 ± 15% perf-profile.children.cycles-pp.__mmap_region 1.37 ± 83% +2.5 3.92 ± 35% perf-profile.children.cycles-pp.ksys_mmap_pgoff 2.17 ± 57% +2.7 4.84 ± 34% perf-profile.children.cycles-pp.do_mmap 2.17 ± 57% +2.9 5.08 ± 33% perf-profile.children.cycles-pp.vm_mmap_pgoff Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki