On Wed, 2024-11-13 at 16:37 +0000, Chuck Lever III wrote: > > > On Nov 13, 2024, at 11:20 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Wed, 2024-11-13 at 11:14 -0500, Chuck Lever wrote: > > > On Wed, Nov 13, 2024 at 11:10:35AM -0500, Jeff Layton wrote: > > > > On Wed, 2024-11-13 at 10:48 -0500, Chuck Lever wrote: > > > > > On Thu, Nov 07, 2024 at 06:35:11AM -0500, Jeff Layton wrote: > > > > > > On Thu, 2024-11-07 at 12:55 +0800, kernel test robot wrote: > > > > > > > hi, Jeff Layton, > > > > > > > > > > > > > > in commit message, it is mentioned the change is expected to solve the > > > > > > > "App Overhead" on the fs_mark test we reported in > > > > > > > https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@xxxxxxxxx/ > > > > > > > > > > > > > > however, in our tests, there is sill similar regression. at the same > > > > > > > time, there is still no performance difference for fsmark.files_per_sec > > > > > > > > > > > > > > 2015880 ± 3% +92.0% 3870164 fsmark.app_overhead > > > > > > > 18.57 +0.0% 18.57 fsmark.files_per_sec > > > > > > > > > > > > > > > > > > > > > another thing is our bot bisect to this commit in repo/branch as below detail > > > > > > > information. if there is a more porper repo/branch to test the patch, could > > > > > > > you let us know? thanks a lot! > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > kernel test robot noticed a 92.0% regression of fsmark.app_overhead on: > > > > > > > > > > > > > > > > > > > > > commit: b4be3ccf1c251cbd3a3cf5391a80fe3a5f6f075e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git cel-nfsd-next-6.12-rc5 > > > > > > > > > > > > > > > > > > > > > testcase: fsmark > > > > > > > config: x86_64-rhel-8.3 > > > > > > > compiler: gcc-12 > > > > > > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory > > > > > > > parameters: > > > > > > > > > > > > > > iterations: 1x > > > > > > > nr_threads: 1t > > > > > > > disk: 1HDD > > > > > > > fs: btrfs > > > > > > > fs2: nfsv4 > > > > > > > filesize: 4K > > > > > > > test_size: 40M > > > > > > > sync_method: fsyncBeforeClose > > > > > > > nr_files_per_directory: 1fpd > > > > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > > > > the same patch/commit), kindly add following tags > > > > > > > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > > > > > > Closes: https://lore.kernel.org/oe-lkp/202411071017.ddd9e9e2-oliver.sang@xxxxxxxxx > > > > > > > > > > > > > > > > > > > > > Details are as below: > > > > > > > --------------------------------------------------------------------------------------------------> > > > > > > > > > > > > > > > > > > > > > The kernel config and materials to reproduce are available at: > > > > > > > https://download.01.org/0day-ci/archive/20241107/202411071017.ddd9e9e2-oliver.sang@xxxxxxxxx > > > > > > > > > > > > > > ========================================================================================= > > > > > > > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: > > > > > > > gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark > > > > > > > > > > > > > > commit: > > > > > > > 37f27b20cd ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") > > > > > > > b4be3ccf1c ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") > > > > > > > > > > > > > > 37f27b20cd64e2e0 b4be3ccf1c251cbd3a3cf5391a8 > > > > > > > ---------------- --------------------------- > > > > > > > %stddev %change %stddev > > > > > > > \ | \ > > > > > > > 97.33 ± 9% -16.3% 81.50 ± 9% perf-c2c.HITM.local > > > > > > > 3788 ±101% +147.5% 9377 ± 6% sched_debug.cfs_rq:/.load_avg.max > > > > > > > 2936 -6.2% 2755 vmstat.system.cs > > > > > > > 2015880 ± 3% +92.0% 3870164 fsmark.app_overhead > > > > > > > 18.57 +0.0% 18.57 fsmark.files_per_sec > > > > > > > 53420 -17.3% 44185 fsmark.time.voluntary_context_switches > > > > > > > 1.50 ± 7% +13.4% 1.70 ± 3% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 > > > > > > > 3.00 ± 7% +13.4% 3.40 ± 3% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 > > > > > > > 1756957 -2.1% 1720536 proc-vmstat.numa_hit > > > > > > > 1624496 -2.2% 1588039 proc-vmstat.numa_local > > > > > > > 1.28 ± 4% -8.2% 1.17 ± 3% perf-stat.i.MPKI > > > > > > > 2916 -6.2% 2735 perf-stat.i.context-switches > > > > > > > 1529 ± 4% +8.2% 1655 ± 3% perf-stat.i.cycles-between-cache-misses > > > > > > > 2910 -6.2% 2729 perf-stat.ps.context-switches > > > > > > > 0.67 ± 15% -0.4 0.27 ±100% perf-profile.calltrace.cycles-pp._Fork > > > > > > > 0.95 ± 15% +0.3 1.26 ± 11% perf-profile.calltrace.cycles-pp.__sched_setaffinity.sched_setaffinity.__x64_sys_sched_setaffinity.do_syscall_64.entry_SYSCALL_64_after_hwframe > > > > > > > 0.70 ± 47% +0.3 1.04 ± 9% perf-profile.calltrace.cycles-pp._nohz_idle_balance.do_idle.cpu_startup_entry.start_secondary.common_startup_64 > > > > > > > 0.52 ± 45% +0.3 0.86 ± 15% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe > > > > > > > 0.72 ± 50% +0.4 1.12 ± 18% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat > > > > > > > 1.22 ± 26% +0.4 1.67 ± 12% perf-profile.calltrace.cycles-pp.sched_setaffinity.evlist_cpu_iterator__next.read_counters.process_interval.dispatch_events > > > > > > > 2.20 ± 11% +0.6 2.78 ± 13% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read > > > > > > > 2.20 ± 11% +0.6 2.82 ± 12% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read > > > > > > > 2.03 ± 13% +0.6 2.67 ± 13% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read > > > > > > > 0.68 ± 15% -0.2 0.47 ± 19% perf-profile.children.cycles-pp._Fork > > > > > > > 0.56 ± 15% -0.1 0.42 ± 16% perf-profile.children.cycles-pp.tcp_v6_do_rcv > > > > > > > 0.46 ± 13% -0.1 0.35 ± 16% perf-profile.children.cycles-pp.alloc_pages_mpol_noprof > > > > > > > 0.10 ± 75% +0.2 0.29 ± 19% perf-profile.children.cycles-pp.refresh_cpu_vm_stats > > > > > > > 0.28 ± 33% +0.2 0.47 ± 16% perf-profile.children.cycles-pp.show_stat > > > > > > > 0.34 ± 32% +0.2 0.54 ± 16% perf-profile.children.cycles-pp.fold_vm_numa_events > > > > > > > 0.37 ± 11% +0.3 0.63 ± 23% perf-profile.children.cycles-pp.setup_items_for_insert > > > > > > > 0.88 ± 15% +0.3 1.16 ± 12% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr > > > > > > > 0.37 ± 14% +0.3 0.67 ± 61% perf-profile.children.cycles-pp.btrfs_writepages > > > > > > > 0.37 ± 14% +0.3 0.67 ± 61% perf-profile.children.cycles-pp.extent_write_cache_pages > > > > > > > 0.64 ± 19% +0.3 0.94 ± 15% perf-profile.children.cycles-pp.btrfs_insert_empty_items > > > > > > > 0.38 ± 12% +0.3 0.68 ± 58% perf-profile.children.cycles-pp.btrfs_fdatawrite_range > > > > > > > 0.32 ± 23% +0.3 0.63 ± 64% perf-profile.children.cycles-pp.extent_writepage > > > > > > > 0.97 ± 14% +0.3 1.31 ± 10% perf-profile.children.cycles-pp.__sched_setaffinity > > > > > > > 1.07 ± 16% +0.4 1.44 ± 10% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity > > > > > > > 1.39 ± 18% +0.5 1.90 ± 12% perf-profile.children.cycles-pp.seq_read_iter > > > > > > > 0.34 ± 30% +0.2 0.52 ± 16% perf-profile.self.cycles-pp.fold_vm_numa_events > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Disclaimer: > > > > > > > Results have been estimated based on internal Intel analysis and are provided > > > > > > > for informational purposes only. Any difference in system hardware or software > > > > > > > design or configuration may affect actual performance. > > > > > > > > > > > > > > > > > > > > > > > > > > This patch (b4be3ccf1c) is exceedingly simple, so I doubt it's causing > > > > > > a performance regression in the server. The only thing I can figure > > > > > > here is that this test is causing the server to recall the delegation > > > > > > that it hands out, and then the client has to go and establish a new > > > > > > open stateid in order to return it. That would likely be slower than > > > > > > just handing out both an open and delegation stateid in the first > > > > > > place. > > > > > > > > > > > > I don't think there is anything we can do about that though, since the > > > > > > feature seems to is working as designed. > > > > > > > > > > We seem to have hit this problem before. That makes me wonder > > > > > whether it is actually worth supporting the XOR flag at all. After > > > > > all, the client sends the CLOSE asynchronously; applications are not > > > > > being held up while the unneeded state ID is returned. > > > > > > > > > > Can XOR support be disabled for now? I don't want to add an > > > > > administrative interface for that, but also, "no regressions" is > > > > > ringing in my ears, and 92% is a mighty noticeable one. > > > > > > > > To be clear, this increase is for the "App Overhead" which is all of > > > > the operations between the stuff that is being measured in this test. I > > > > did run this test for a bit and got similar results, but was never able > > > > to nail down where the overhead came from. My speculation is that it's > > > > the recall and reestablishment of an open stateid that slows things > > > > down, but I never could fully confirm it. > > > > > > > > My issue with disabling this is that the decision of whether to set > > > > OPEN_XOR_DELEGATION is up to the client. The server in this case is > > > > just doing what the client asks. ISTM that if we were going to disable > > > > (or throttle) this anywhere, that should be done by the client. > > > > > > If the client sets the XOR flag, NFSD could simply return only an > > > OPEN stateid, for instance. > > > > Sure, but then it's disabled for everyone, for every workload. There > > may be workloads the benefit too. > > It doesn't impact NFSv4.[01] mounts, nor will it impact any > workload on a client that does not now support XOR. > > I don't think we need bad press around this feature, nor do we > want complaints about "why didn't you add a switch to disable > this?". I'd also like to avoid a bug filed for this regression > if it gets merged. > > Can you test to see if returning only the OPEN stateid makes the > regression go away? The behavior would be in place only until > there is a root cause and a way to avoid the regression. > I don't have a test rig set up for this at the moment as this sort of testing requires real hw. I also don't have anything automated for doing this. That will probably take a while, so if you're that concerned, then just drop the patch. Maybe we'll get a datapoint from the hammerspace folks that will help clarify the problem in the meantime. -- Jeff Layton <jlayton@xxxxxxxxxx>