Re: [cel:nfsd-testing] [nfsd] ab62726202: fsmark.app_overhead 186.4% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi, Chuck Lever,

On Sat, Dec 28, 2024 at 12:32:55PM -0500, Chuck Lever wrote:
> On 12/27/24 4:13 AM, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed a 186.4% regression of fsmark.app_overhead on:
> > (but no diff for fsmark.files_per_sec as below (a))
> 
> Hello Oliver -
> 
> I have questions about this test result.
> 
> Is this https://github.com/josefbacik/fs_mark ?

yes, we use fsmark test suite from this url.

> 
> I don't understand what "app_overhead" is measuring. Is this "think
> time"?

sorry, we just follow the guidance to enable the test.

as in https://download.01.org/0day-ci/archive/20241227/202412271641.cfba5666-lkp@xxxxxxxxx/repro-script
we run the test by:

fs_mark -d /nfs/sdb1/1 -d /nfs/sdb1/2 -d /nfs/sdb1/3 -d /nfs/sdb1/4 -d /nfs/sdb1/5 -d /nfs/sdb1/6 -d /nfs/sdb1/7 -d /nfs/sdb1/8 -d /nfs/sdb1/9 -d /nfs/sdb1/10 -d /nfs/sdb1/11 -d /nfs/sdb1/12 -d /nfs/sdb1/13 -d /nfs/sdb1/14 -d /nfs/sdb1/15 -d /nfs/sdb1/16 -d /nfs/sdb1/17 -d /nfs/sdb1/18 -d /nfs/sdb1/19 -d /nfs/sdb1/20 -d /nfs/sdb1/21 -d /nfs/sdb1/22 -d /nfs/sdb1/23 -d /nfs/sdb1/24 -d /nfs/sdb1/25 -d /nfs/sdb1/26 -d /nfs/sdb1/27 -d /nfs/sdb1/28 -d /nfs/sdb1/29 -d /nfs/sdb1/30 -d /nfs/sdb1/31 -d /nfs/sdb1/32 -D 16 -N 256 -n 30 -L 1 -S 0 -s 16777216



below is an example of output (there is oneline to explain App overhead (1)):

2024-12-27 00:12:33 fs_mark -d /nfs/sdb1/1 -d /nfs/sdb1/2 -d /nfs/sdb1/3 -d /nfs/sdb1/4 -d /nfs/sdb1/5 -d /nfs/sdb1/6 -d /nfs/sdb1/7 -d /nfs/sdb1/8 -d /nfs/sdb1/9 -d /nfs/sdb1/10 -d /nfs/sdb1/11 -d /nfs/sdb1/12 -d /nfs/sdb1/13 -d /nfs/sdb1/14 -d /nfs/sdb1/15 -d /nfs/sdb1/16 -d /nfs/sdb1/17 -d /nfs/sdb1/18 -d /nfs/sdb1/19 -d /nfs/sdb1/20 -d /nfs/sdb1/21 -d /nfs/sdb1/22 -d /nfs/sdb1/23 -d /nfs/sdb1/24 -d /nfs/sdb1/25 -d /nfs/sdb1/26 -d /nfs/sdb1/27 -d /nfs/sdb1/28 -d /nfs/sdb1/29 -d /nfs/sdb1/30 -d /nfs/sdb1/31 -d /nfs/sdb1/32 -D 16 -N 256 -n 30 -L 1 -S 0 -s 16777216

#  fs_mark  -d  /nfs/sdb1/1  -d  /nfs/sdb1/2  -d  /nfs/sdb1/3  -d  /nfs/sdb1/4  -d  /nfs/sdb1/5  -d  /nfs/sdb1/6  -d  /nfs/sdb1/7  -d  /nfs/sdb1/8  -d  /nfs/sdb1/9  -d  /nfs/sdb1/10  -d  /nfs/sdb1/11  -d  /nfs/sdb1/12  -d  /nfs/sdb1/13  -d  /nfs/sdb1/14  -d  /nfs/sdb1/15  -d  /nfs/sdb1/16  -d  /nfs/sdb1/17  -d  /nfs/sdb1/18  -d  /nfs/sdb1/19  -d  /nfs/sdb1/20  -d  /nfs/sdb1/21  -d  /nfs/sdb1/22  -d  /nfs/sdb1/23  -d  /nfs/sdb1/24  -d  /nfs/sdb1/25  -d  /nfs/sdb1/26  -d  /nfs/sdb1/27  -d  /nfs/sdb1/28  -d  /nfs/sdb1/29  -d  /nfs/sdb1/30  -d  /nfs/sdb1/31  -d  /nfs/sdb1/32  -D  16  -N  256  -n  30  -L  1  -S  0  -s  16777216
#       Version 3.3, 32 thread(s) starting at Fri Dec 27 00:12:33 2024
#       Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
#       Directories:  Round Robin between directories across 16 subdirectories with 256 files per subdirectory.
#       File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)
#       Files info: size 16777216 bytes, written with an IO size of 16384 bytes per write
#       App overhead is time in microseconds spent in the test not doing file writing related system calls.   <---- (1)

FSUse%        Count         Size    Files/sec     App Overhead
     7          960     16777216          6.4        920331167
Average Files/sec:          6.0
p50 Files/sec: 6
p90 Files/sec: 6
p99 Files/sec: 6


> 
> A more concerning regression might be:
> 
>        13.03 ±170%    +566.0%      86.78 ± 77%
> 
> perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.svc_tcp_sendto
> 
> But these metrics look like they improved:
> 
>         0.03 ± 56%     -73.4%       0.01 ±149%
> perf-sched.sch_delay.avg.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit
>         0.05 ± 60%     -72.1%       0.02 ±165%
> perf-sched.sch_delay.max.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit
> 
> This is a quite mixed result, IMO -- I'm not convinced it's actionable.
> Can someone help explain/analyze the metrics?
> 
> 
> > commit: ab627262022ed8c6a68e619ed03a14e47acf2e39 ("nfsd: allocate new session-based DRC slots on demand.")
> > https://git.kernel.org/cgit/linux/kernel/git/cel/linux nfsd-testing
> > 
> > testcase: fsmark
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > parameters:
> > 
> > 	iterations: 1x
> > 	nr_threads: 32t
> > 	disk: 1HDD
> > 	fs: btrfs
> > 	fs2: nfsv4
> > 	filesize: 16MB
> > 	test_size: 15G
> > 	sync_method: NoSync
> > 	nr_directories: 16d
> > 	nr_files_per_directory: 256fpd
> > 	cpufreq_governor: performance
> > 
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > | Closes: https://lore.kernel.org/oe-lkp/202412271641.cfba5666-lkp@xxxxxxxxx
> > 
> > 
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20241227/202412271641.cfba5666-lkp@xxxxxxxxx
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> >    gcc-12/performance/1HDD/16MB/nfsv4/btrfs/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/NoSync/lkp-icl-2sp7/15G/fsmark
> > 
> > commit:
> >    ccd01c7601 ("nfsd: add session slot count to /proc/fs/nfsd/clients/*/info")
> >    ab62726202 ("nfsd: allocate new session-based DRC slots on demand.")
> > 
> > ccd01c76017847d1 ab627262022ed8c6a68e619ed03
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >        5.48 ±  9%     +24.9%       6.85 ± 14%  sched_debug.cpu.nr_uninterruptible.stddev
> >       12489           +11.1%      13876        uptime.idle
> >   3.393e+08 ± 16%    +186.4%  9.717e+08 ±  9%  fsmark.app_overhead
> >        6.40            +0.0%       6.40        fsmark.files_per_sec     <-------- (a)
> >        6.00           +27.8%       7.67 ±  6%  fsmark.time.percent_of_cpu_this_job_got
> >       72.33           +15.8%      83.79        iostat.cpu.idle
> >       25.91 ±  3%     -44.3%      14.42 ± 11%  iostat.cpu.iowait
> >       72.08           +11.6       83.64        mpstat.cpu.all.idle%
> >       26.18 ±  3%     -11.6       14.58 ± 11%  mpstat.cpu.all.iowait%
> >      153772 ±  5%     +19.1%     183126 ±  8%  meminfo.DirectMap4k
> >      156099           +19.5%     186594        meminfo.Dirty
> >      467358           -12.9%     406910 ±  2%  meminfo.Writeback
> >       72.35           +15.8%      83.79        vmstat.cpu.id
> >       25.90 ±  3%     -44.3%      14.41 ± 11%  vmstat.cpu.wa
> >       17.61 ±  3%     -45.8%       9.55 ± 10%  vmstat.procs.b
> >        5909 ±  2%      -6.2%       5545        vmstat.system.in
> >        0.03 ± 56%     -73.4%       0.01 ±149%  perf-sched.sch_delay.avg.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit
> >        0.05 ± 60%     -72.1%       0.02 ±165%  perf-sched.sch_delay.max.ms.btrfs_commit_transaction.btrfs_sync_file.nfsd_commit.nfsd4_commit
> >        0.07 ± 41%     +36.1%       0.10 ±  8%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.btrfs_tree_read_lock_nested
> >       13.03 ±170%    +566.0%      86.78 ± 77%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.svc_tcp_sendto
> >      206.83 ± 14%     -31.5%     141.67 ±  6%  perf-sched.wait_and_delay.count.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.__rpc_execute
> >        0.30 ± 62%     -82.1%       0.05 ±110%  perf-sched.wait_time.avg.ms.handle_reserve_ticket.__reserve_bytes.btrfs_reserve_data_bytes.btrfs_check_data_free_space
> >        7.37 ±  4%     -15.8%       6.20 ±  4%  perf-stat.i.MPKI
> >       44.13 ±  2%      -2.9       41.25 ±  2%  perf-stat.i.cache-miss-rate%
> >      103.65 ±  2%     +17.9%     122.17 ±  8%  perf-stat.i.cpu-migrations
> >      627.67 ±  3%     +25.4%     787.18 ±  6%  perf-stat.i.cycles-between-cache-misses
> >        0.67            +3.7%       0.70        perf-stat.i.ipc
> >        1.35            +2.2%       1.38        perf-stat.overall.cpi
> >      373.39            +4.1%     388.79        perf-stat.overall.cycles-between-cache-misses
> >        0.74            -2.1%       0.73        perf-stat.overall.ipc
> >      102.89 ±  2%     +17.9%     121.32 ±  8%  perf-stat.ps.cpu-migrations
> >       39054           +19.0%      46460 ±  2%  proc-vmstat.nr_dirty
> >       15139            +2.2%      15476        proc-vmstat.nr_kernel_stack
> >       45710            +1.9%      46570        proc-vmstat.nr_slab_unreclaimable
> >      116900           -13.5%     101162        proc-vmstat.nr_writeback
> >       87038           -18.2%      71185 ±  2%  proc-vmstat.nr_zone_write_pending
> >     6949807            -3.8%    6688660        proc-vmstat.numa_hit
> >     6882153            -3.8%    6622312        proc-vmstat.numa_local
> >    13471776            -2.0%   13204489        proc-vmstat.pgalloc_normal
> >      584292            +3.8%     606391 ±  3%  proc-vmstat.pgfault
> >       25859            +9.8%      28392 ±  9%  proc-vmstat.pgreuse
> >        2.02 ±  8%      -0.3        1.71 ±  5%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >        1.86 ±  8%      -0.3        1.58 ±  6%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> >        3.42 ±  5%      -0.6        2.87 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> >        2.96 ±  4%      -0.4        2.55 ±  5%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> >        0.35 ± 45%      -0.2        0.14 ± 71%  perf-profile.children.cycles-pp.khugepaged
> >        0.34 ± 46%      -0.2        0.14 ± 71%  perf-profile.children.cycles-pp.hpage_collapse_scan_pmd
> >        0.34 ± 46%      -0.2        0.14 ± 71%  perf-profile.children.cycles-pp.khugepaged_scan_mm_slot
> >        0.34 ± 47%      -0.2        0.14 ± 72%  perf-profile.children.cycles-pp.collapse_huge_page
> >        1.21 ± 10%      -0.2        1.01 ±  8%  perf-profile.children.cycles-pp.__hrtimer_run_queues
> >        0.82 ±  9%      -0.1        0.68 ± 10%  perf-profile.children.cycles-pp.update_process_times
> >        0.41 ±  8%      -0.1        0.29 ± 22%  perf-profile.children.cycles-pp.btrfs_check_data_free_space
> >        0.21 ±  7%      -0.1        0.11 ± 73%  perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
> >        0.55 ± 11%      -0.1        0.46 ± 14%  perf-profile.children.cycles-pp.__set_extent_bit
> >        0.33 ±  9%      -0.1        0.28 ±  8%  perf-profile.children.cycles-pp.nfs_request_add_commit_list
> >        0.17 ±  9%      -0.0        0.13 ± 16%  perf-profile.children.cycles-pp.readn
> >        0.08 ± 13%      -0.0        0.06 ± 14%  perf-profile.children.cycles-pp.load_elf_interp
> >        1.00 ± 16%      +1.2        2.18 ± 53%  perf-profile.children.cycles-pp.folio_batch_move_lru
> >        0.21 ±  8%      -0.1        0.11 ± 73%  perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
> >        0.05 ± 49%      +0.1        0.15 ± 61%  perf-profile.self.cycles-pp.nfs_update_folio
> >        0.94 ±  5%      +0.2        1.11 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> >        0.25 ± 17%      +0.4        0.63 ± 61%  perf-profile.self.cycles-pp.nfs_page_async_flush
> > 
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> 
> 
> -- 
> Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux