On Sat, 2024-12-28 at 12:32 -0500, Chuck Lever wrote: > On 12/27/24 4:13 AM, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a 186.4% regression of fsmark.app_overhead on: > > (but no diff for fsmark.files_per_sec as below (a)) > > Hello Oliver - > > I have questions about this test result. > > Is this https://github.com/josefbacik/fs_mark ? > > I don't understand what "app_overhead" is measuring. Is this "think > time"? > > A more concerning regression might be: > > 13.03 ±170% +566.0% 86.78 ± 77% > > perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.svc_tcp_sendto > > But these metrics look like they improved: > > 0.03 ± 56% -73.4% 0.01 ±149% > perf-sched.sch_delay.avg.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > 0.05 ± 60% -72.1% 0.02 ±165% > perf-sched.sch_delay.max.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > > This is a quite mixed result, IMO -- I'm not convinced it's actionable. > Can someone help explain/analyze the metrics? > I've spent some time staring at fs_mark [1] with the similar regression reported due to the OPEN_XOR_DELEGATION patches. It forks a bunch of processes and does a bunch of create/write/fsync, etc. in a directory hierarchy, based on the command-line options. "App overhead" in that program consists of this: total_file_ops = creat_usec + total_write_usec + fsync_usec + avg_sync_usec + close_usec; app_overhead_usec = loop_usecs - total_file_ops; IOW, it measures the timing of various calls into the kernel and tallies the time they took. It then subtracts the total of those from the total time in the loop, and that's the "App overhead". The problem I had was that the "App overhead" covers quite a bit of the code (some of which does kernel syscalls too!). My effort at bracketing where the increase came from didn't give me reliable results. I meant to get back to it, but haven't had the time recently. [1]: https://github.com/josefbacik/fs_mark > > > commit: ab627262022ed8c6a68e619ed03a14e47acf2e39 ("nfsd: allocate new session-based DRC slots on demand.") > > https://git.kernel.org/cgit/linux/kernel/git/cel/linux nfsd-testing > > > > testcase: fsmark > > config: x86_64-rhel-9.4 > > compiler: gcc-12 > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > parameters: > > > > iterations: 1x > > nr_threads: 32t > > disk: 1HDD > > fs: btrfs > > fs2: nfsv4 > > filesize: 16MB > > test_size: 15G > > sync_method: NoSync > > nr_directories: 16d > > nr_files_per_directory: 256fpd > > cpufreq_governor: performance > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > Closes: https://lore.kernel.org/oe-lkp/202412271641.cfba5666-lkp@xxxxxxxxx > > > > > > Details are as below: > > --------------------------------------------------------------------------------------------------> > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20241227/202412271641.cfba5666-lkp@xxxxxxxxx > > > > ========================================================================================= > > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: > > gcc-12/performance/1HDD/16MB/nfsv4/btrfs/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/NoSync/lkp-icl-2sp7/15G/fsmark > > > > commit: > > ccd01c7601 ("nfsd: add session slot count to /proc/fs/nfsd/clients/*/info") > > ab62726202 ("nfsd: allocate new session-based DRC slots on demand.") > > > > ccd01c76017847d1 ab627262022ed8c6a68e619ed03 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 5.48 ± 9% +24.9% 6.85 ± 14% sched_debug.cpu.nr_uninterruptible.stddev > > 12489 +11.1% 13876 uptime.idle > > 3.393e+08 ± 16% +186.4% 9.717e+08 ± 9% fsmark.app_overhead > > 6.40 +0.0% 6.40 fsmark.files_per_sec <-------- (a) > > 6.00 +27.8% 7.67 ± 6% fsmark.time.percent_of_cpu_this_job_got > > 72.33 +15.8% 83.79 iostat.cpu.idle > > 25.91 ± 3% -44.3% 14.42 ± 11% iostat.cpu.iowait > > 72.08 +11.6 83.64 mpstat.cpu.all.idle% > > 26.18 ± 3% -11.6 14.58 ± 11% mpstat.cpu.all.iowait% > > 153772 ± 5% +19.1% 183126 ± 8% meminfo.DirectMap4k > > 156099 +19.5% 186594 meminfo.Dirty > > 467358 -12.9% 406910 ± 2% meminfo.Writeback > > 72.35 +15.8% 83.79 vmstat.cpu.id > > 25.90 ± 3% -44.3% 14.41 ± 11% vmstat.cpu.wa > > 17.61 ± 3% -45.8% 9.55 ± 10% vmstat.procs.b > > 5909 ± 2% -6.2% 5545 vmstat.system.in > > 0.03 ± 56% -73.4% 0.01 ±149% perf-sched.sch_delay.avg.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > > 0.05 ± 60% -72.1% 0.02 ±165% perf-sched.sch_delay.max.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit > > 0.07 ± 41% +36.1% 0.10 ± 8% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested > > 13.03 ±170% +566.0% 86.78 ± 77% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.svc_tcp_sendto > > 206.83 ± 14% -31.5% 141.67 ± 6% perf-sched.wait_and_delay.count.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.__rpc_execute > > 0.30 ± 62% -82.1% 0.05 ±110% perf-sched.wait_time.avg.ms.handle_reserve_ticket.__reserve_bytes.btrfs_reserve_data_bytes.btrfs_check_data_free_space > > 7.37 ± 4% -15.8% 6.20 ± 4% perf-stat.i.MPKI > > 44.13 ± 2% -2.9 41.25 ± 2% perf-stat.i.cache-miss-rate% > > 103.65 ± 2% +17.9% 122.17 ± 8% perf-stat.i.cpu-migrations > > 627.67 ± 3% +25.4% 787.18 ± 6% perf-stat.i.cycles-between-cache-misses > > 0.67 +3.7% 0.70 perf-stat.i.ipc > > 1.35 +2.2% 1.38 perf-stat.overall.cpi > > 373.39 +4.1% 388.79 perf-stat.overall.cycles-between-cache-misses > > 0.74 -2.1% 0.73 perf-stat.overall.ipc > > 102.89 ± 2% +17.9% 121.32 ± 8% perf-stat.ps.cpu-migrations > > 39054 +19.0% 46460 ± 2% proc-vmstat.nr_dirty > > 15139 +2.2% 15476 proc-vmstat.nr_kernel_stack > > 45710 +1.9% 46570 proc-vmstat.nr_slab_unreclaimable > > 116900 -13.5% 101162 proc-vmstat.nr_writeback > > 87038 -18.2% 71185 ± 2% proc-vmstat.nr_zone_write_pending > > 6949807 -3.8% 6688660 proc-vmstat.numa_hit > > 6882153 -3.8% 6622312 proc-vmstat.numa_local > > 13471776 -2.0% 13204489 proc-vmstat.pgalloc_normal > > 584292 +3.8% 606391 ± 3% proc-vmstat.pgfault > > 25859 +9.8% 28392 ± 9% proc-vmstat.pgreuse > > 2.02 ± 8% -0.3 1.71 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > > 1.86 ± 8% -0.3 1.58 ± 6% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call > > 3.42 ± 5% -0.6 2.87 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > > 2.96 ± 4% -0.4 2.55 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt > > 0.35 ± 45% -0.2 0.14 ± 71% perf-profile.children.cycles-pp.khugepaged > > 0.34 ± 46% -0.2 0.14 ± 71% perf-profile.children.cycles-pp.hpage_collapse_scan_pmd > > 0.34 ± 46% -0.2 0.14 ± 71% perf-profile.children.cycles-pp.khugepaged_scan_mm_slot > > 0.34 ± 47% -0.2 0.14 ± 72% perf-profile.children.cycles-pp.collapse_huge_page > > 1.21 ± 10% -0.2 1.01 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues > > 0.82 ± 9% -0.1 0.68 ± 10% perf-profile.children.cycles-pp.update_process_times > > 0.41 ± 8% -0.1 0.29 ± 22% perf-profile.children.cycles-pp.btrfs_check_data_free_space > > 0.21 ± 7% -0.1 0.11 ± 73% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string > > 0.55 ± 11% -0.1 0.46 ± 14% perf-profile.children.cycles-pp.__set_extent_bit > > 0.33 ± 9% -0.1 0.28 ± 8% perf-profile.children.cycles-pp.nfs_request_add_commit_list > > 0.17 ± 9% -0.0 0.13 ± 16% perf-profile.children.cycles-pp.readn > > 0.08 ± 13% -0.0 0.06 ± 14% perf-profile.children.cycles-pp.load_elf_interp > > 1.00 ± 16% +1.2 2.18 ± 53% perf-profile.children.cycles-pp.folio_batch_move_lru > > 0.21 ± 8% -0.1 0.11 ± 73% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string > > 0.05 ± 49% +0.1 0.15 ± 61% perf-profile.self.cycles-pp.nfs_update_folio > > 0.94 ± 5% +0.2 1.11 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave > > 0.25 ± 17% +0.4 0.63 ± 61% perf-profile.self.cycles-pp.nfs_page_async_flush > > > > > > > > > > Disclaimer: > > Results have been estimated based on internal Intel analysis and are provided > > for informational purposes only. Any difference in system hardware or software > > design or configuration may affect actual performance. > > > > > > -- Jeff Layton <jlayton@xxxxxxxxxx>