hi, Chuck Lever, On Sat, Dec 28, 2024 at 12:32:55PM -0500, Chuck Lever wrote: > On 12/27/24 4:13 AM, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a 186.4% regression of fsmark.app_overhead on: > > (but no diff for fsmark.files_per_sec as below (a)) > > Hello Oliver - > > I have questions about this test result. > > Is this https://github.com/josefbacik/fs_mark ? yes, we use fsmark test suite from this url. > > I don't understand what "app_overhead" is measuring. Is this "think > time"? sorry, we just follow the guidance to enable the test. as in https://download.01.org/0day-ci/archive/20241227/202412271641.cfba5666-lkp@xxxxxxxxx/repro-script we run the test by: fs_mark -d /nfs/sdb1/1 -d /nfs/sdb1/2 -d /nfs/sdb1/3 -d /nfs/sdb1/4 -d /nfs/sdb1/5 -d /nfs/sdb1/6 -d /nfs/sdb1/7 -d /nfs/sdb1/8 -d /nfs/sdb1/9 -d /nfs/sdb1/10 -d /nfs/sdb1/11 -d /nfs/sdb1/12 -d /nfs/sdb1/13 -d /nfs/sdb1/14 -d /nfs/sdb1/15 -d /nfs/sdb1/16 -d /nfs/sdb1/17 -d /nfs/sdb1/18 -d /nfs/sdb1/19 -d /nfs/sdb1/20 -d /nfs/sdb1/21 -d /nfs/sdb1/22 -d /nfs/sdb1/23 -d /nfs/sdb1/24 -d /nfs/sdb1/25 -d /nfs/sdb1/26 -d /nfs/sdb1/27 -d /nfs/sdb1/28 -d /nfs/sdb1/29 -d /nfs/sdb1/30 -d /nfs/sdb1/31 -d /nfs/sdb1/32 -D 16 -N 256 -n 30 -L 1 -S 0 -s 16777216 below is an example of output (there is oneline to explain App overhead (1)): 2024-12-27 00:12:33 fs_mark -d /nfs/sdb1/1 -d /nfs/sdb1/2 -d /nfs/sdb1/3 -d /nfs/sdb1/4 -d /nfs/sdb1/5 -d /nfs/sdb1/6 -d /nfs/sdb1/7 -d /nfs/sdb1/8 -d /nfs/sdb1/9 -d /nfs/sdb1/10 -d /nfs/sdb1/11 -d /nfs/sdb1/12 -d /nfs/sdb1/13 -d /nfs/sdb1/14 -d /nfs/sdb1/15 -d /nfs/sdb1/16 -d /nfs/sdb1/17 -d /nfs/sdb1/18 -d /nfs/sdb1/19 -d /nfs/sdb1/20 -d /nfs/sdb1/21 -d /nfs/sdb1/22 -d /nfs/sdb1/23 -d /nfs/sdb1/24 -d /nfs/sdb1/25 -d /nfs/sdb1/26 -d /nfs/sdb1/27 -d /nfs/sdb1/28 -d /nfs/sdb1/29 -d /nfs/sdb1/30 -d /nfs/sdb1/31 -d /nfs/sdb1/32 -D 16 -N 256 -n 30 -L 1 -S 0 -s 16777216 # fs_mark -d /nfs/sdb1/1 -d /nfs/sdb1/2 -d /nfs/sdb1/3 -d /nfs/sdb1/4 -d /nfs/sdb1/5 -d /nfs/sdb1/6 -d /nfs/sdb1/7 -d /nfs/sdb1/8 -d /nfs/sdb1/9 -d /nfs/sdb1/10 -d /nfs/sdb1/11 -d /nfs/sdb1/12 -d /nfs/sdb1/13 -d /nfs/sdb1/14 -d /nfs/sdb1/15 -d /nfs/sdb1/16 -d /nfs/sdb1/17 -d /nfs/sdb1/18 -d /nfs/sdb1/19 -d /nfs/sdb1/20 -d /nfs/sdb1/21 -d /nfs/sdb1/22 -d /nfs/sdb1/23 -d /nfs/sdb1/24 -d /nfs/sdb1/25 -d /nfs/sdb1/26 -d /nfs/sdb1/27 -d /nfs/sdb1/28 -d /nfs/sdb1/29 -d /nfs/sdb1/30 -d /nfs/sdb1/31 -d /nfs/sdb1/32 -D 16 -N 256 -n 30 -L 1 -S 0 -s 16777216 # Version 3.3, 32 thread(s) starting at Fri Dec 27 00:12:33 2024 # Sync method: NO SYNC: Test does not issue sync() or fsync() calls. # Directories: Round Robin between directories across 16 subdirectories with 256 files per subdirectory. # File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name) # Files info: size 16777216 bytes, written with an IO size of 16384 bytes per write # App overhead is time in microseconds spent in the test not doing file writing related system calls. <---- (1) FSUse% Count Size Files/sec App Overhead 7 960 16777216 6.4 920331167 Average Files/sec: 6.0 p50 Files/sec: 6 p90 Files/sec: 6 p99 Files/sec: 6 > > A more concerning regression might be: > > 13.03 ±170% +566.0% 86.78 ± 77% > > perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.svc_tcp_sendto > > But these metrics look like they improved: > > 0.03 ± 56% -73.4% 0.01 ±149% > perf-sched.sch_delay.avg.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > 0.05 ± 60% -72.1% 0.02 ±165% > perf-sched.sch_delay.max.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > > This is a quite mixed result, IMO -- I'm not convinced it's actionable. > Can someone help explain/analyze the metrics? > > > > commit: ab627262022ed8c6a68e619ed03a14e47acf2e39 ("nfsd: allocate new session-based DRC slots on demand.") > > https://git.kernel.org/cgit/linux/kernel/git/cel/linux nfsd-testing > > > > testcase: fsmark > > config: x86_64-rhel-9.4 > > compiler: gcc-12 > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > parameters: > > > > iterations: 1x > > nr_threads: 32t > > disk: 1HDD > > fs: btrfs > > fs2: nfsv4 > > filesize: 16MB > > test_size: 15G > > sync_method: NoSync > > nr_directories: 16d > > nr_files_per_directory: 256fpd > > cpufreq_governor: performance > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > | Closes: https://lore.kernel.org/oe-lkp/202412271641.cfba5666-lkp@xxxxxxxxx > > > > > > Details are as below: > > --------------------------------------------------------------------------------------------------> > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20241227/202412271641.cfba5666-lkp@xxxxxxxxx > > > > ========================================================================================= > > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: > > gcc-12/performance/1HDD/16MB/nfsv4/btrfs/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/NoSync/lkp-icl-2sp7/15G/fsmark > > > > commit: > > ccd01c7601 ("nfsd: add session slot count to /proc/fs/nfsd/clients/*/info") > > ab62726202 ("nfsd: allocate new session-based DRC slots on demand.") > > > > ccd01c76017847d1 ab627262022ed8c6a68e619ed03 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 5.48 ± 9% +24.9% 6.85 ± 14% sched_debug.cpu.nr_uninterruptible.stddev > > 12489 +11.1% 13876 uptime.idle > > 3.393e+08 ± 16% +186.4% 9.717e+08 ± 9% fsmark.app_overhead > > 6.40 +0.0% 6.40 fsmark.files_per_sec <-------- (a) > > 6.00 +27.8% 7.67 ± 6% fsmark.time.percent_of_cpu_this_job_got > > 72.33 +15.8% 83.79 iostat.cpu.idle > > 25.91 ± 3% -44.3% 14.42 ± 11% iostat.cpu.iowait > > 72.08 +11.6 83.64 mpstat.cpu.all.idle% > > 26.18 ± 3% -11.6 14.58 ± 11% mpstat.cpu.all.iowait% > > 153772 ± 5% +19.1% 183126 ± 8% meminfo.DirectMap4k > > 156099 +19.5% 186594 meminfo.Dirty > > 467358 -12.9% 406910 ± 2% meminfo.Writeback > > 72.35 +15.8% 83.79 vmstat.cpu.id > > 25.90 ± 3% -44.3% 14.41 ± 11% vmstat.cpu.wa > > 17.61 ± 3% -45.8% 9.55 ± 10% vmstat.procs.b > > 5909 ± 2% -6.2% 5545 vmstat.system.in > > 0.03 ± 56% -73.4% 0.01 ±149% perf-sched.sch_delay.avg.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > > 0.05 ± 60% -72.1% 0.02 ±165% perf-sched.sch_delay.max.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > > 0.07 ± 41% +36.1% 0.10 ± 8% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested > > 13.03 ±170% +566.0% 86.78 ± 77% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.svc_tcp_sendto > > 206.83 ± 14% -31.5% 141.67 ± 6% perf-sched.wait_and_delay.count.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.__rpc_execute > > 0.30 ± 62% -82.1% 0.05 ±110% perf-sched.wait_time.avg.ms.handle_reserve_ticket.__reserve_bytes.btrfs_reserve_data_bytes.btrfs_check_data_free_space > > 7.37 ± 4% -15.8% 6.20 ± 4% perf-stat.i.MPKI > > 44.13 ± 2% -2.9 41.25 ± 2% perf-stat.i.cache-miss-rate% > > 103.65 ± 2% +17.9% 122.17 ± 8% perf-stat.i.cpu-migrations > > 627.67 ± 3% +25.4% 787.18 ± 6% perf-stat.i.cycles-between-cache-misses > > 0.67 +3.7% 0.70 perf-stat.i.ipc > > 1.35 +2.2% 1.38 perf-stat.overall.cpi > > 373.39 +4.1% 388.79 perf-stat.overall.cycles-between-cache-misses > > 0.74 -2.1% 0.73 perf-stat.overall.ipc > > 102.89 ± 2% +17.9% 121.32 ± 8% perf-stat.ps.cpu-migrations > > 39054 +19.0% 46460 ± 2% proc-vmstat.nr_dirty > > 15139 +2.2% 15476 proc-vmstat.nr_kernel_stack > > 45710 +1.9% 46570 proc-vmstat.nr_slab_unreclaimable > > 116900 -13.5% 101162 proc-vmstat.nr_writeback > > 87038 -18.2% 71185 ± 2% proc-vmstat.nr_zone_write_pending > > 6949807 -3.8% 6688660 proc-vmstat.numa_hit > > 6882153 -3.8% 6622312 proc-vmstat.numa_local > > 13471776 -2.0% 13204489 proc-vmstat.pgalloc_normal > > 584292 +3.8% 606391 ± 3% proc-vmstat.pgfault > > 25859 +9.8% 28392 ± 9% proc-vmstat.pgreuse > > 2.02 ± 8% -0.3 1.71 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > > 1.86 ± 8% -0.3 1.58 ± 6% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call > > 3.42 ± 5% -0.6 2.87 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > > 2.96 ± 4% -0.4 2.55 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt > > 0.35 ± 45% -0.2 0.14 ± 71% perf-profile.children.cycles-pp.khugepaged > > 0.34 ± 46% -0.2 0.14 ± 71% perf-profile.children.cycles-pp.hpage_collapse_scan_pmd > > 0.34 ± 46% -0.2 0.14 ± 71% perf-profile.children.cycles-pp.khugepaged_scan_mm_slot > > 0.34 ± 47% -0.2 0.14 ± 72% perf-profile.children.cycles-pp.collapse_huge_page > > 1.21 ± 10% -0.2 1.01 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues > > 0.82 ± 9% -0.1 0.68 ± 10% perf-profile.children.cycles-pp.update_process_times > > 0.41 ± 8% -0.1 0.29 ± 22% perf-profile.children.cycles-pp.btrfs_check_data_free_space > > 0.21 ± 7% -0.1 0.11 ± 73% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string > > 0.55 ± 11% -0.1 0.46 ± 14% perf-profile.children.cycles-pp.__set_extent_bit > > 0.33 ± 9% -0.1 0.28 ± 8% perf-profile.children.cycles-pp.nfs_request_add_commit_list > > 0.17 ± 9% -0.0 0.13 ± 16% perf-profile.children.cycles-pp.readn > > 0.08 ± 13% -0.0 0.06 ± 14% perf-profile.children.cycles-pp.load_elf_interp > > 1.00 ± 16% +1.2 2.18 ± 53% perf-profile.children.cycles-pp.folio_batch_move_lru > > 0.21 ± 8% -0.1 0.11 ± 73% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string > > 0.05 ± 49% +0.1 0.15 ± 61% perf-profile.self.cycles-pp.nfs_update_folio > > 0.94 ± 5% +0.2 1.11 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave > > 0.25 ± 17% +0.4 0.63 ± 61% perf-profile.self.cycles-pp.nfs_page_async_flush > > > > > > > > > > Disclaimer: > > Results have been estimated based on internal Intel analysis and are provided > > for informational purposes only. Any difference in system hardware or software > > design or configuration may affect actual performance. > > > > > > > -- > Chuck Lever