On Mon, 2024-09-16 at 16:41 +0800, kernel test robot wrote: > > Hello, > > kernel test robot noticed a 81.6% regression of fsmark.app_overhead on: > > > commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > testcase: fsmark > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory > parameters: > > iterations: 1x > nr_threads: 1t > disk: 1HDD > fs: btrfs > fs2: nfsv4 > filesize: 4K > test_size: 40M > sync_method: fsyncBeforeClose > nr_files_per_directory: 1fpd > cpufreq_governor: performance > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@xxxxxxxxx > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@xxxxxxxxx > > ========================================================================================= > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: > gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark > > commit: > e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") > 8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") > > e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 24388 ± 20% -32.8% 16400 ± 18% numa-vmstat.node0.nr_slab_reclaimable > 61.50 ± 4% -10.6% 55.00 ± 6% perf-c2c.HITM.local > 0.20 ± 3% +23.0% 0.24 ± 13% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity > 2977 -6.1% 2796 vmstat.system.cs > 2132466 ± 2% +81.6% 3871852 fsmark.app_overhead I have been unable to reproduce this result with fs_mark. I've run a number of repeated tests, and I can create files just as fast with or without this patch (roughly ~46 files/s on my test machine). I'm particularly suspicious of the fsmark.app_overhead value above. The fsmark output says: # App overhead is time in microseconds spent in the test not doing file writing related system calls. That seems outside the purview of anything we're altering here, so I have to wonder if something else is going on. Oliver, can you rerun this test and see if this regression is reproducible? Thanks, > 53442 -17.3% 44172 fsmark.time.voluntary_context_switches > 2907 -5.7% 2742 perf-stat.i.context-switches > 2902 -5.7% 2737 perf-stat.ps.context-switches > 1724787 -1.0% 1706808 proc-vmstat.numa_hit > 1592345 -1.1% 1574310 proc-vmstat.numa_local > 24.87 ± 33% -38.9% 15.20 ± 12% sched_debug.cpu.nr_uninterruptible.max > 4.36 ± 9% -17.1% 3.61 ± 10% sched_debug.cpu.nr_uninterruptible.stddev > 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.KReclaimable > 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.SReclaimable > 256796 ± 9% -18.7% 208805 ± 13% numa-meminfo.node0.Slab > 2307911 ± 52% +68.5% 3888971 ± 5% numa-meminfo.node1.MemUsed > 193326 ± 12% +24.7% 241049 ± 12% numa-meminfo.node1.Slab > 0.90 ± 27% -0.5 0.36 ±103% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat > 0.36 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write > 0.52 ± 47% +0.3 0.78 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 1.62 ± 12% +0.3 1.93 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 1.22 ± 21% -0.3 0.89 ± 10% perf-profile.children.cycles-pp.readn > 0.46 ± 32% -0.2 0.24 ± 34% perf-profile.children.cycles-pp.__close > 0.45 ± 32% -0.2 0.22 ± 15% perf-profile.children.cycles-pp.__x64_sys_close > 0.40 ± 29% -0.2 0.18 ± 38% perf-profile.children.cycles-pp.__fput > 0.31 ± 23% -0.2 0.16 ± 33% perf-profile.children.cycles-pp.irq_work_tick > 0.17 ± 51% -0.1 0.03 ±111% perf-profile.children.cycles-pp.nfs_file_release > 0.16 ± 43% -0.1 0.03 ±111% perf-profile.children.cycles-pp.__put_nfs_open_context > 0.26 ± 18% -0.1 0.15 ± 34% perf-profile.children.cycles-pp.perf_event_task_tick > 0.15 ± 41% -0.1 0.03 ±108% perf-profile.children.cycles-pp.get_free_pages_noprof > 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.children.cycles-pp.native_apic_mem_eoi > 0.18 ± 32% -0.1 0.07 ± 81% perf-profile.children.cycles-pp.flush_end_io > 0.17 ± 41% -0.1 0.07 ± 93% perf-profile.children.cycles-pp.mas_store_gfp > 0.52 ± 5% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.btrfs_commit_transaction > 0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.uptime_proc_show > 0.02 ±141% +0.1 0.08 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof > 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.__rmqueue_pcplist > 0.14 ± 12% +0.1 0.28 ± 29% perf-profile.children.cycles-pp.hrtimer_next_event_without > 0.47 ± 27% +0.2 0.67 ± 19% perf-profile.children.cycles-pp.__mmap > 0.70 ± 21% +0.2 0.91 ± 7% perf-profile.children.cycles-pp.vfs_write > 0.74 ± 20% +0.2 0.96 ± 9% perf-profile.children.cycles-pp.ksys_write > 0.73 ± 21% +0.3 1.00 ± 7% perf-profile.children.cycles-pp.copy_process > 1.05 ± 13% +0.3 1.38 ± 10% perf-profile.children.cycles-pp.kernel_clone > 0.28 ± 22% -0.1 0.13 ± 35% perf-profile.self.cycles-pp.irq_work_tick > 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.self.cycles-pp.native_apic_mem_eoi > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > -- Jeff Layton <jlayton@xxxxxxxxxx>