Re: [snitzer:cel-nfsd-next-6.12-rc5] [nfsd] b4be3ccf1c: fsmark.app_overhead 92.0% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 13, 2024 at 11:10:35AM -0500, Jeff Layton wrote:
> On Wed, 2024-11-13 at 10:48 -0500, Chuck Lever wrote:
> > On Thu, Nov 07, 2024 at 06:35:11AM -0500, Jeff Layton wrote:
> > > On Thu, 2024-11-07 at 12:55 +0800, kernel test robot wrote:
> > > > hi, Jeff Layton,
> > > > 
> > > > in commit message, it is mentioned the change is expected to solve the
> > > > "App Overhead" on the fs_mark test we reported in
> > > > https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@xxxxxxxxx/
> > > > 
> > > > however, in our tests, there is sill similar regression. at the same
> > > > time, there is still no performance difference for fsmark.files_per_sec
> > > > 
> > > >    2015880 ±  3%     +92.0%    3870164        fsmark.app_overhead
> > > >      18.57            +0.0%      18.57        fsmark.files_per_sec
> > > > 
> > > > 
> > > > another thing is our bot bisect to this commit in repo/branch as below detail
> > > > information. if there is a more porper repo/branch to test the patch, could
> > > > you let us know? thanks a lot!
> > > > 
> > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > kernel test robot noticed a 92.0% regression of fsmark.app_overhead on:
> > > > 
> > > > 
> > > > commit: b4be3ccf1c251cbd3a3cf5391a80fe3a5f6f075e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > > > https://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git cel-nfsd-next-6.12-rc5
> > > > 
> > > > 
> > > > testcase: fsmark
> > > > config: x86_64-rhel-8.3
> > > > compiler: gcc-12
> > > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> > > > parameters:
> > > > 
> > > > 	iterations: 1x
> > > > 	nr_threads: 1t
> > > > 	disk: 1HDD
> > > > 	fs: btrfs
> > > > 	fs2: nfsv4
> > > > 	filesize: 4K
> > > > 	test_size: 40M
> > > > 	sync_method: fsyncBeforeClose
> > > > 	nr_files_per_directory: 1fpd
> > > > 	cpufreq_governor: performance
> > > > 
> > > > 
> > > > 
> > > > 
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > > > > Closes: https://lore.kernel.org/oe-lkp/202411071017.ddd9e9e2-oliver.sang@xxxxxxxxx
> > > > 
> > > > 
> > > > Details are as below:
> > > > -------------------------------------------------------------------------------------------------->
> > > > 
> > > > 
> > > > The kernel config and materials to reproduce are available at:
> > > > https://download.01.org/0day-ci/archive/20241107/202411071017.ddd9e9e2-oliver.sang@xxxxxxxxx
> > > > 
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> > > >   gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
> > > > 
> > > > commit: 
> > > >   37f27b20cd ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
> > > >   b4be3ccf1c ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > > > 
> > > > 37f27b20cd64e2e0 b4be3ccf1c251cbd3a3cf5391a8 
> > > > ---------------- --------------------------- 
> > > >          %stddev     %change         %stddev
> > > >              \          |                \  
> > > >      97.33 ±  9%     -16.3%      81.50 ±  9%  perf-c2c.HITM.local
> > > >       3788 ±101%    +147.5%       9377 ±  6%  sched_debug.cfs_rq:/.load_avg.max
> > > >       2936            -6.2%       2755        vmstat.system.cs
> > > >    2015880 ±  3%     +92.0%    3870164        fsmark.app_overhead
> > > >      18.57            +0.0%      18.57        fsmark.files_per_sec
> > > >      53420           -17.3%      44185        fsmark.time.voluntary_context_switches
> > > >       1.50 ±  7%     +13.4%       1.70 ±  3%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> > > >       3.00 ±  7%     +13.4%       3.40 ±  3%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> > > >    1756957            -2.1%    1720536        proc-vmstat.numa_hit
> > > >    1624496            -2.2%    1588039        proc-vmstat.numa_local
> > > >       1.28 ±  4%      -8.2%       1.17 ±  3%  perf-stat.i.MPKI
> > > >       2916            -6.2%       2735        perf-stat.i.context-switches
> > > >       1529 ±  4%      +8.2%       1655 ±  3%  perf-stat.i.cycles-between-cache-misses
> > > >       2910            -6.2%       2729        perf-stat.ps.context-switches
> > > >       0.67 ± 15%      -0.4        0.27 ±100%  perf-profile.calltrace.cycles-pp._Fork
> > > >       0.95 ± 15%      +0.3        1.26 ± 11%  perf-profile.calltrace.cycles-pp.__sched_setaffinity.sched_setaffinity.__x64_sys_sched_setaffinity.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > >       0.70 ± 47%      +0.3        1.04 ±  9%  perf-profile.calltrace.cycles-pp._nohz_idle_balance.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> > > >       0.52 ± 45%      +0.3        0.86 ± 15%  perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > >       0.72 ± 50%      +0.4        1.12 ± 18%  perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> > > >       1.22 ± 26%      +0.4        1.67 ± 12%  perf-profile.calltrace.cycles-pp.sched_setaffinity.evlist_cpu_iterator__next.read_counters.process_interval.dispatch_events
> > > >       2.20 ± 11%      +0.6        2.78 ± 13%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> > > >       2.20 ± 11%      +0.6        2.82 ± 12%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
> > > >       2.03 ± 13%      +0.6        2.67 ± 13%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> > > >       0.68 ± 15%      -0.2        0.47 ± 19%  perf-profile.children.cycles-pp._Fork
> > > >       0.56 ± 15%      -0.1        0.42 ± 16%  perf-profile.children.cycles-pp.tcp_v6_do_rcv
> > > >       0.46 ± 13%      -0.1        0.35 ± 16%  perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
> > > >       0.10 ± 75%      +0.2        0.29 ± 19%  perf-profile.children.cycles-pp.refresh_cpu_vm_stats
> > > >       0.28 ± 33%      +0.2        0.47 ± 16%  perf-profile.children.cycles-pp.show_stat
> > > >       0.34 ± 32%      +0.2        0.54 ± 16%  perf-profile.children.cycles-pp.fold_vm_numa_events
> > > >       0.37 ± 11%      +0.3        0.63 ± 23%  perf-profile.children.cycles-pp.setup_items_for_insert
> > > >       0.88 ± 15%      +0.3        1.16 ± 12%  perf-profile.children.cycles-pp.__set_cpus_allowed_ptr
> > > >       0.37 ± 14%      +0.3        0.67 ± 61%  perf-profile.children.cycles-pp.btrfs_writepages
> > > >       0.37 ± 14%      +0.3        0.67 ± 61%  perf-profile.children.cycles-pp.extent_write_cache_pages
> > > >       0.64 ± 19%      +0.3        0.94 ± 15%  perf-profile.children.cycles-pp.btrfs_insert_empty_items
> > > >       0.38 ± 12%      +0.3        0.68 ± 58%  perf-profile.children.cycles-pp.btrfs_fdatawrite_range
> > > >       0.32 ± 23%      +0.3        0.63 ± 64%  perf-profile.children.cycles-pp.extent_writepage
> > > >       0.97 ± 14%      +0.3        1.31 ± 10%  perf-profile.children.cycles-pp.__sched_setaffinity
> > > >       1.07 ± 16%      +0.4        1.44 ± 10%  perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity
> > > >       1.39 ± 18%      +0.5        1.90 ± 12%  perf-profile.children.cycles-pp.seq_read_iter
> > > >       0.34 ± 30%      +0.2        0.52 ± 16%  perf-profile.self.cycles-pp.fold_vm_numa_events
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Disclaimer:
> > > > Results have been estimated based on internal Intel analysis and are provided
> > > > for informational purposes only. Any difference in system hardware or software
> > > > design or configuration may affect actual performance.
> > > > 
> > > > 
> > > 
> > > This patch (b4be3ccf1c) is exceedingly simple, so I doubt it's causing
> > > a performance regression in the server. The only thing I can figure
> > > here is that this test is causing the server to recall the delegation
> > > that it hands out, and then the client has to go and establish a new
> > > open stateid in order to return it. That would likely be slower than
> > > just handing out both an open and delegation stateid in the first
> > > place.
> > > 
> > > I don't think there is anything we can do about that though, since the
> > > feature seems to is working as designed.
> > 
> > We seem to have hit this problem before. That makes me wonder
> > whether it is actually worth supporting the XOR flag at all. After
> > all, the client sends the CLOSE asynchronously; applications are not
> > being held up while the unneeded state ID is returned.
> > 
> > Can XOR support be disabled for now? I don't want to add an
> > administrative interface for that, but also, "no regressions" is
> > ringing in my ears, and 92% is a mighty noticeable one.
> 
> To be clear, this increase is for the "App Overhead" which is all of
> the operations between the stuff that is being measured in this test. I
> did run this test for a bit and got similar results, but was never able
> to nail down where the overhead came from. My speculation is that it's
> the recall and reestablishment of an open stateid that slows things
> down, but I never could fully confirm it.
> 
> My issue with disabling this is that the decision of whether to set
> OPEN_XOR_DELEGATION is up to the client. The server in this case is
> just doing what the client asks. ISTM that if we were going to disable
> (or throttle) this anywhere, that should be done by the client.

If the client sets the XOR flag, NFSD could simply return only an
OPEN stateid, for instance.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux