Hello, kernel test robot noticed a 81.6% regression of fsmark.app_overhead on: commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master testcase: fsmark test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: iterations: 1x nr_threads: 1t disk: 1HDD fs: btrfs fs2: nfsv4 filesize: 4K test_size: 40M sync_method: fsyncBeforeClose nr_files_per_directory: 1fpd cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark commit: e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") 8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d ---------------- --------------------------- %stddev %change %stddev \ | \ 24388 ± 20% -32.8% 16400 ± 18% numa-vmstat.node0.nr_slab_reclaimable 61.50 ± 4% -10.6% 55.00 ± 6% perf-c2c.HITM.local 0.20 ± 3% +23.0% 0.24 ± 13% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 2977 -6.1% 2796 vmstat.system.cs 2132466 ± 2% +81.6% 3871852 fsmark.app_overhead 53442 -17.3% 44172 fsmark.time.voluntary_context_switches 2907 -5.7% 2742 perf-stat.i.context-switches 2902 -5.7% 2737 perf-stat.ps.context-switches 1724787 -1.0% 1706808 proc-vmstat.numa_hit 1592345 -1.1% 1574310 proc-vmstat.numa_local 24.87 ± 33% -38.9% 15.20 ± 12% sched_debug.cpu.nr_uninterruptible.max 4.36 ± 9% -17.1% 3.61 ± 10% sched_debug.cpu.nr_uninterruptible.stddev 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.KReclaimable 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.SReclaimable 256796 ± 9% -18.7% 208805 ± 13% numa-meminfo.node0.Slab 2307911 ± 52% +68.5% 3888971 ± 5% numa-meminfo.node1.MemUsed 193326 ± 12% +24.7% 241049 ± 12% numa-meminfo.node1.Slab 0.90 ± 27% -0.5 0.36 ±103% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat 0.36 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write 0.52 ± 47% +0.3 0.78 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.62 ± 12% +0.3 1.93 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.22 ± 21% -0.3 0.89 ± 10% perf-profile.children.cycles-pp.readn 0.46 ± 32% -0.2 0.24 ± 34% perf-profile.children.cycles-pp.__close 0.45 ± 32% -0.2 0.22 ± 15% perf-profile.children.cycles-pp.__x64_sys_close 0.40 ± 29% -0.2 0.18 ± 38% perf-profile.children.cycles-pp.__fput 0.31 ± 23% -0.2 0.16 ± 33% perf-profile.children.cycles-pp.irq_work_tick 0.17 ± 51% -0.1 0.03 ±111% perf-profile.children.cycles-pp.nfs_file_release 0.16 ± 43% -0.1 0.03 ±111% perf-profile.children.cycles-pp.__put_nfs_open_context 0.26 ± 18% -0.1 0.15 ± 34% perf-profile.children.cycles-pp.perf_event_task_tick 0.15 ± 41% -0.1 0.03 ±108% perf-profile.children.cycles-pp.get_free_pages_noprof 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.children.cycles-pp.native_apic_mem_eoi 0.18 ± 32% -0.1 0.07 ± 81% perf-profile.children.cycles-pp.flush_end_io 0.17 ± 41% -0.1 0.07 ± 93% perf-profile.children.cycles-pp.mas_store_gfp 0.52 ± 5% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.btrfs_commit_transaction 0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.uptime_proc_show 0.02 ±141% +0.1 0.08 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.__rmqueue_pcplist 0.14 ± 12% +0.1 0.28 ± 29% perf-profile.children.cycles-pp.hrtimer_next_event_without 0.47 ± 27% +0.2 0.67 ± 19% perf-profile.children.cycles-pp.__mmap 0.70 ± 21% +0.2 0.91 ± 7% perf-profile.children.cycles-pp.vfs_write 0.74 ± 20% +0.2 0.96 ± 9% perf-profile.children.cycles-pp.ksys_write 0.73 ± 21% +0.3 1.00 ± 7% perf-profile.children.cycles-pp.copy_process 1.05 ± 13% +0.3 1.38 ± 10% perf-profile.children.cycles-pp.kernel_clone 0.28 ± 22% -0.1 0.13 ± 35% perf-profile.self.cycles-pp.irq_work_tick 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.self.cycles-pp.native_apic_mem_eoi Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki