Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/24/2023 4:00 AM, Steve French wrote:
> This is VERY useful data.  Thank you.   It show the SMB3 CLOSE
> operation on the wire takes twice as long to complete with
> closetimeo=5 rather than 1.  Can you do the same stats for
> closetimeo=0
Please find in attached file for /proc/fs/cifs/Stats dumping with closetimeo=0.
BTW, there is no regression with closetimeo=0.

Regards
Yin, Fengwei

> 
> On Thu, Mar 23, 2023 at 2:48 AM Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote:
>>
>> On 3/15/23 22:26, Steve French wrote:
>>> Can you verify what this perf looks like with "closetime=0" and "closetime=1"
>>>
>>> Are there differences in /proc/fs/cifs/Stats when you run the same
>>> steps with "closetimeo=1" vs. the more recent default (5 seconds)?
>> The dump of /proc/fs/cifs/Stats are attached. You can tell which
>> closetimeo is used from the file name. Thanks.
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>> On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> On 8/26/2022 10:41 AM, kernel test robot wrote:
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
>>>>>
>>>>>
>>>>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>>
>>>>> in testcase: filebench
>>>>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
>>>>> with following parameters:
>>>>>
>>>>>      disk: 1HDD
>>>>>      fs: ext4
>>>>>      fs2: cifs
>>>>>      test: filemicro_delete.f
>>>>>      cpufreq_governor: performance
>>>>>      ucode: 0x5003302
>>>> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
>>>> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
>>>> value from 1s to 5s. If change the timeout back to 1s as following:
>>>>
>>>> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
>>>> index 6d13f8207e96..6b930fb0c4bd 100644
>>>> --- a/fs/cifs/fs_context.c
>>>> +++ b/fs/cifs/fs_context.c
>>>> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>>>>
>>>>          ctx->acregmax = CIFS_DEF_ACTIMEO;
>>>>          ctx->acdirmax = CIFS_DEF_ACTIMEO;
>>>> -       ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
>>>> +       ctx->closetimeo = CIFS_DEF_ACTIMEO;
>>>>
>>>>          /* Most clients set timeout to 0, allows server to use its default */
>>>>          ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>>>>
>>>> The regression is gone:
>>>> dcb45fd7f501f864                    v6.3-rc2 32715af441411a5a266606be08f
>>>> ---------------- --------------------------- ---------------------------
>>>>         fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
>>>>             |             |             |             |             |
>>>>             :25           0%            :3           33%           1:3     last_state.booting
>>>>             :25           0%            :3           33%           1:3     last_state.is_incomplete_run
>>>>           %stddev     %change         %stddev     %change         %stddev
>>>>               \          |                \          |                \
>>>>      515.95           -50.0%     257.98            -0.0%     515.92        filebench.sum_operations/s
>>>>        5.04 ±  7%    +833.7%      47.09 ±  2%      -2.9%       4.90 ±  2%  filebench.sum_time_ms/op
>>>>       10438          -100.0%       0.33 ±141%    -100.0%       0.50 ±100%  filebench.time.major_page_faults
>>>>      167575            -4.1%     160660            -4.4%     160140        filebench.time.maximum_resident_set_size
>>>>        7138 ± 11%    +141.4%      17235 ±  3%    +147.6%      17677        filebench.time.minor_page_faults
>>>>       28.68 ±  9%    +199.9%      86.00 ±  7%      -2.4%      28.00        filebench.time.percent_of_cpu_this_job_got
>>>>     2453485 ± 54%     -63.0%     907380           -66.2%     830273 ±  6%  cpuidle..usage
>>>>        0.61 ± 38%      +0.8        1.41 ±  3%      +0.2        0.80 ±  4%  mpstat.cpu.all.sys%
>>>>      142984 ± 13%     -45.6%      77725           -47.5%      75106        vmstat.system.in
>>>>       34.23 ±  7%     +27.9%      43.79           +27.8%      43.74        boot-time.boot
>>>>       17.09 ± 11%     +66.0%      28.38           +65.5%      28.28        boot-time.dhcp
>>>>        2661 ±  7%     +37.5%       3659           +37.2%       3651        boot-time.idle
>>>>      104737 ±185%     -87.8%      12762 ± 10%     -89.8%      10631 ±  4%  turbostat.C1
>>>>
>>>>
>>>> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
>>>>
>>>> Regards
>>>> Yin, Fengwei
>>>>
>>>>
>>>>>
>>>>>
>>>>> =========================================================================================
>>>>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>>>>>    gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>>>>>
>>>>> commit:
>>>>>    dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
>>>>>    5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>>>>>
>>>>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
>>>>> ---------------- ---------------------------
>>>>>           %stddev     %change         %stddev
>>>>>               \          |                \
>>>>>      515.95           -50.0%     257.98        filebench.sum_operations/s
>>>>>        4.81 ±  2%   +1038.4%      54.78 ±  6%  filebench.sum_time_ms/op
>>>>>       29.00 ±  8%    +212.1%      90.50 ±  3%  filebench.time.percent_of_cpu_this_job_got
>>>>>       24629            +2.7%      25297        filebench.time.voluntary_context_switches
>>>>>   7.685e+08           +19.3%  9.169e+08 ±  4%  cpuidle..time
>>>>>        2.53 ±  6%     -20.6%       2.00 ±  3%  iostat.cpu.iowait
>>>>>     1506141 ±  8%     +22.4%    1843256 ±  3%  turbostat.IRQ
>>>>>        2.00           -50.0%       1.00        vmstat.procs.b
>>>>>       21969 ±  2%      -9.5%      19885 ±  2%  vmstat.system.cs
>>>>>        3.06 ±  7%      -0.7        2.35 ±  4%  mpstat.cpu.all.iowait%
>>>>>        0.79 ±  5%      +0.5        1.27 ±  2%  mpstat.cpu.all.sys%
>>>>>        0.89 ±  3%      -0.1        0.79 ±  3%  mpstat.cpu.all.usr%
>>>>>       34.55 ± 14%     -34.8%      22.51 ± 27%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
>>>>>      119.64 ±  3%     -20.0%      95.69 ± 17%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>>>>>       34.55 ± 14%     -34.8%      22.51 ± 27%  sched_debug.cfs_rq:/.removed.util_avg.avg
>>>>>      119.64 ±  3%     -20.0%      95.69 ± 17%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>>>>>        5249           +15.8%       6076        meminfo.Active
>>>>>        3866 ±  2%     +17.7%       4552        meminfo.Active(anon)
>>>>>        1382 ±  4%     +10.3%       1524 ±  4%  meminfo.Active(file)
>>>>>       69791 ± 14%     +39.8%      97553 ±  6%  meminfo.AnonHugePages
>>>>>       72709 ±  2%     +12.5%      81779 ±  3%  meminfo.Inactive(file)
>>>>>       23219           +13.5%      26352 ±  3%  meminfo.KernelStack
>>>>>      966.50 ±  2%     +17.7%       1137        proc-vmstat.nr_active_anon
>>>>>       74302            +6.3%      78977 ±  2%  proc-vmstat.nr_anon_pages
>>>>>       81133            +6.0%      85973        proc-vmstat.nr_inactive_anon
>>>>>       18172 ±  2%     +12.5%      20442 ±  3%  proc-vmstat.nr_inactive_file
>>>>>       23213           +13.5%      26348 ±  3%  proc-vmstat.nr_kernel_stack
>>>>>       17983            +2.3%      18400        proc-vmstat.nr_mapped
>>>>>        7446 ±  2%      +5.5%       7853 ±  3%  proc-vmstat.nr_shmem
>>>>>       26888            +1.6%      27306        proc-vmstat.nr_slab_reclaimable
>>>>>       47220            +3.4%      48803        proc-vmstat.nr_slab_unreclaimable
>>>>>      966.50 ±  2%     +17.7%       1137        proc-vmstat.nr_zone_active_anon
>>>>>       81133            +6.0%      85973        proc-vmstat.nr_zone_inactive_anon
>>>>>       18172 ±  2%     +12.5%      20442 ±  3%  proc-vmstat.nr_zone_inactive_file
>>>>>      361460            +2.5%     370454        proc-vmstat.numa_hit
>>>>>      946.67           +18.6%       1122        proc-vmstat.pgactivate
>>>>>      361562            +2.5%     370553        proc-vmstat.pgalloc_normal
>>>>>      187906            +4.7%     196761        proc-vmstat.pgfault
>>>>>        8189            +2.5%       8395        proc-vmstat.pgreuse
>>>>>   1.097e+09           +15.5%  1.267e+09 ±  7%  perf-stat.i.branch-instructions
>>>>>    39079265 ±  6%     -20.9%   30915354 ±  4%  perf-stat.i.branch-misses
>>>>>     5093263 ±  4%     -23.7%    3884752 ±  9%  perf-stat.i.cache-misses
>>>>>       29213           -18.7%      23764 ±  5%  perf-stat.i.context-switches
>>>>>   7.666e+09 ±  4%      +5.7%  8.106e+09 ±  2%  perf-stat.i.cpu-cycles
>>>>>        1877 ± 15%     +75.1%       3287 ± 12%  perf-stat.i.cycles-between-cache-misses
>>>>>     1735450 ±  3%     -12.9%    1512060 ±  3%  perf-stat.i.iTLB-load-misses
>>>>>        2898 ±  3%     +34.4%       3895 ±  7%  perf-stat.i.instructions-per-iTLB-miss
>>>>>        1493           -20.3%       1190 ±  7%  perf-stat.i.major-faults
>>>>>        0.09 ±  3%      +5.8%       0.09 ±  2%  perf-stat.i.metric.GHz
>>>>>       48.47 ± 11%      +8.4       56.83 ±  7%  perf-stat.i.node-store-miss-rate%
>>>>>      283426 ±  4%     -21.6%     222190 ± 10%  perf-stat.i.node-stores
>>>>>        3.57 ±  7%      -1.1        2.44 ±  6%  perf-stat.overall.branch-miss-rate%
>>>>>        1508 ±  3%     +39.8%       2108 ±  9%  perf-stat.overall.cycles-between-cache-misses
>>>>>        3022 ±  3%     +23.6%       3736 ±  5%  perf-stat.overall.instructions-per-iTLB-miss
>>>>>   9.585e+08           +18.8%  1.138e+09 ±  6%  perf-stat.ps.branch-instructions
>>>>>    34151514 ±  6%     -18.8%   27725316 ±  4%  perf-stat.ps.branch-misses
>>>>>     4450329 ±  5%     -21.7%    3486409 ±  9%  perf-stat.ps.cache-misses
>>>>>       25524           -16.4%      21333 ±  4%  perf-stat.ps.context-switches
>>>>>       77139            +2.5%      79105        perf-stat.ps.cpu-clock
>>>>>   6.704e+09 ±  4%      +8.7%  7.287e+09        perf-stat.ps.cpu-cycles
>>>>>    1.06e+09           +11.3%   1.18e+09 ±  5%  perf-stat.ps.dTLB-loads
>>>>>     1517349 ±  3%     -10.5%    1357716 ±  2%  perf-stat.ps.iTLB-load-misses
>>>>>   4.582e+09           +10.8%  5.075e+09 ±  6%  perf-stat.ps.instructions
>>>>>        1296           -18.1%       1061 ±  6%  perf-stat.ps.major-faults
>>>>>      247613 ±  4%     -19.5%     199283 ±  9%  perf-stat.ps.node-stores
>>>>>       77139            +2.5%      79105        perf-stat.ps.task-clock
>>>>>   3.697e+10           +35.3%  5.003e+10        perf-stat.total.instructions
>>>>>        8.51 ± 91%      -6.9        1.59 ±223%  perf-profile.calltrace.cycles-pp.getdents64
>>>>>        8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>>        6.25 ±107%      -6.2        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>>>>>        6.25 ±107%      -6.2        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>>>>>        6.25 ±107%      -6.2        0.00        perf-profile.calltrace.cycles-pp.open64
>>>>>        7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>>>>>        7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>>        7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>>        7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>>        6.26 ±115%      -5.4        0.88 ±223%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>>        6.26 ±115%      -5.4        0.88 ±223%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>>        6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>>>>>        6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>>>>>        6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>>>>>        6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>>>>>        4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>>>>>        4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>>>>>        4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>>>>>        4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>>>>>        8.51 ± 91%      -6.9        1.59 ±223%  perf-profile.children.cycles-pp.getdents64
>>>>>        8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>>>>>        8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>>>>>        8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>>>>>        6.25 ±107%      -6.2        0.00        perf-profile.children.cycles-pp.open64
>>>>>        7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.children.cycles-pp.__x64_sys_getdents64
>>>>>        7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.children.cycles-pp.iterate_dir
>>>>>        6.26 ±115%      -4.6        1.67 ±141%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
>>>>>        6.26 ±115%      -4.6        1.67 ±141%  perf-profile.children.cycles-pp.get_signal
>>>>>        4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.children.cycles-pp.perf_mmap__read_head
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.unmap_vmas
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.unmap_page_range
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.zap_pmd_range
>>>>>        3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.zap_pte_range
>>>>>        8.54 ± 43%      +8.6       17.19 ± 38%  perf-profile.children.cycles-pp.asm_exc_page_fault
>>>>>
>>>>>
>>>>> If you fix the issue, kindly add following tag
>>>>> Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
>>>>>
>>>>>
>>>>> To reproduce:
>>>>>
>>>>>          git clone https://github.com/intel/lkp-tests.git
>>>>>          cd lkp-tests
>>>>>          sudo bin/lkp install job.yaml           # job file is attached in this email
>>>>>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>>>          sudo bin/lkp run generated-yaml-file
>>>>>
>>>>>          # if come across any failure that blocks the test,
>>>>>          # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>>>
>>>>>
>>>>> Disclaimer:
>>>>> Results have been estimated based on internal Intel analysis and are provided
>>>>> for informational purposes only. Any difference in system hardware or software
>>>>> design or configuration may affect actual performance.
>>>>>
>>>>>
>>>>> #regzbot introduced: 5efdd9122e
>>>>>
>>>>>
>>>
>>>
>>>
> 
> 
> 
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Total Large 177 Small 59665 Allocations
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 21761 maximum at one time: 46

Max requests in flight: 68
Total time spent processing by command. Time units are jiffies (1000 per second)
  SMB3 CMD	Number	Total Time	Fastest	Slowest
  --------	------	----------	-------	-------
  0		1	24		24	24
  1		2	2		1	1
  2		0	0		0	0
  3		2	1		0	1
  4		0	0		0	0
  5		11635	23045		0	64
  6		11635	22049		0	64
  7		0	0		0	0
  8		0	0		0	0
  9		5000	21114		0	40
  10		0	0		0	0
  11		2	0		0	0
  12		0	0		0	0
  13		0	0		0	0
  14		187	72		0	2
  15		0	0		0	0
  16		1394	682		0	6
  17		62	23		0	13
  18		0	0		0	0

1) \\localhost\fs
SMBs: 29915
Bytes read: 0  Bytes written: 81920000
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 11635 total 0 failed
Closes: 11635 total 0 failed
Flushes: 0 total 0 failed
Reads: 0 total 0 failed
Writes: 5000 total 0 failed
Locks: 0 total 0 failed
IOCTLs: 1 total 1 failed
QueryDirectories: 187 total 0 failed
ChangeNotifies: 0 total 0 failed
QueryInfos: 1394 total 0 failed
SetInfos: 62 total 0 failed
OplockBreaks: 0 sent 0 failed

[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux