Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators

Cristian Marussi <cristian.marussi@xxxxxxx> · Thu, 29 Nov 2018 19:28:47 +0000

Hi Trond, Catalin

On 09/11/2018 11:19, Catalin Marinas wrote:
> Hi Trond,
> 
> On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote:
>> Most of this code should also be reusable with other socket types.
>>
>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>> ---
>>  include/linux/sunrpc/xprtsock.h |  19 +-
>>  include/trace/events/sunrpc.h   |  15 +-
>>  net/sunrpc/xprtsock.c           | 694 +++++++++++++++-----------------
>>  3 files changed, 335 insertions(+), 393 deletions(-)
> 
> With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter
> fixup) I started hitting some severe slowdown and systemd timeouts with
> nfsroot on arm64 machines (physical or guests under KVM). Interestingly,
> it only happens when the client kernel is configured with 64K pages, the
> 4K pages configuration runs fine. It also runs fine if I add rsize=65536
> to the nfsroot= argument.
> 
> Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP receive
> code by switching to using iterators"). Prior to this commit, it works
> fine.
> 
> Some more info:
> 
> - defconfig with CONFIG_ARM64_64K_PAGES enabled
> 
> - kernel cmdline arg: nfsroot=<some-server>:/srv/nfs/debian-arm64,tcp,v4
> 
> - if it matters, the server is also an arm64 machine running 4.19 with
>   4K pages configuration
> 
> I haven't figured out what's wrong or even how to debug this as I'm not
> familiar with the sunrpc code. Any suggestion?
> 
> Thanks.
> 

I've done a bit of experiments/observations with this since it was seriously
impacting all form of testing on arm64 with a 64K pages configuration.

I can confirm rsize=65536 workaround above mentioned by Catalin is effective
also for me, as it is to reset back before the commit mentioned in the subject.

In the following I tested instead with:

- linus arm64 v4.20-rc1 64K pages
  + "Debug Lockups and Hangs" Enabled
- hw Juno-r2
- fully NFS mounted rootfs (Debian 9)
- NO rsize workaround
- NFS Client config(nfsstat -m)
Flags:
rw,relatime,vers=4.0,rsize=4096,wsize=4096,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.1,local_lock=none,addr=192.168.0.254

Observations:

1. despite some general boot slow down (not so evident in my setup), I hit the
issue when simply trying to launch LTP or LKP tests (all is in NFS mounted
rootfs): the immediately observable behavior is just that the application gets
'apparently' stuck straight away (NO output whatsoever). Waiting some seconds
yields no progress or result NOR any Lockup is detected by Kernel. A good deal
of effort is needed to kill the process at this point...BUT it is feasible (many
SIGSTOP + KILL)...and the system is back alive.

2. running again LKP via 'strace' we can observe the process apparently starting
fine but then suddenly hanging randomly multiple times: at first on an execve()
and then on some read() while trying to load its own file components; each hang
lasts 30-45 seconds approximately.

In LKP as an example:

$ strace lkp run ./dbench-100%.yaml

....
newfstatat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", {st_mode=S_IFREG|0755,
st_size=4367, ...}, 0) = 0
faccessat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", X_OK) = 0
execve("/opt/lkp-tests/bin/run-local", ["/opt/lkp-tests/bin/run-local",
"./dbench-100%.yaml"], [/* 12 vars */]                            <<< HANGGG
...
...
30-40 secs..
....
openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0
close(7)                                = 0
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
oopenat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fcntl(7, F_SETFL, O_RDONLY)             = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0
ioctl(7, TCGETS, 0xffffdd0895d8)        = -1 ENOTTY (Inappropriate ioctl for device)
read(7, <<<HANGGSSS

....
~30-40 secs
....

"# frozen_string_literal: true\n# "..., 8192) = 8192
read(7, ") as the standard configuration "..., 8192) = 8192
brk(0xaaaaeea70000)                     = 0xaaaaeea70000
read(7, "ady been required, then we have "..., 8192) = 8192
read(7, "lf.user_home\n    @user_home ||= "..., 8192) = 8192
read(7, "    require \"rubygems/defaults/#"..., 8192) = 250
read(7, "", 8192)                       = 0
close(7)                                = 0
....
....

openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems/specification.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0
close(7)                                = 0
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems/specification.rb",
O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
fcntl(7, F_SETFL, O_RDONLY)             = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=81998, ...}) = 0
ioctl(7, TCGETS, 0xffffef390b38)        = -1 ENOTTY (Inappropriate ioctl for device)
read(7, "# -*- coding: utf-8 -*-\n# frozen"..., 8192) = 8192
read(7, "e platform attribute appropriate"..., 8192) = 8192
read(7, "sealicense.com/.\n  #\n  # You sho"..., 8192) = 8192
read(7, " TODO: find all extraneous adds\n"..., 8192) = 8192
read(7, "rn a list of all outdated local "..., 8192) = 8192
read(7, "          = authors.collect { |a"..., 8192) = 8192
read(7, "ends on.\n  #\n  # Use #add_depend"..., 8192) = 8192
read(7, "ns true you\n  # probably want to"..., 8192) = 8192
read(7,  <<<< HANGGGGG

Note that this last hang happens halfway through a file read !

3. Having a look at the underlying network traffic with Wireshark I could see in
fact that the NFS packets stop flowing completely for 30-40s when all of the
above happens...but I cannot see any error or timeout or NFS retries.
Same happened when I tried to reduce NFS timeo to 150 (15secs) from the original
600 (60 secs). This lack of retries was confirmed by stats:
root@sqwt-ubuntu:/opt/lkp-tests# nfsstat -r
Client rpc stats:
calls      retrans    authrefrsh
16008      0          16019

The only notable thing is a routine TCP KeepAlive sent by the NFS Client TCP
stack during the 30/40 secs quiet window: NFS data flow restarts anyway after
another 10/12 secs after the KeepAlive is sent and ACKed by the server, so it
does not seem the trigger itself for the restart.

4. Waiting forever (minutes) I was finally be able to see LKP completing
initialization and the dbench test being run. Below you can see the results with
and without the rsize workaround:

DBENCH SANE RESULTS WITH WORKAROUND RSIZE=65536
------------------------------------------------
...
   6    122759     5.62 MB/sec  execute 599 sec  latency 84.304 ms
   6  cleanup 600 sec
   0  cleanup 600 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX     108397    11.100   358.921
 Close          79762    16.243   406.213
 Rename          4582    19.399   259.180
 Unlink         21728     3.610   190.564
 Qpathinfo      98367     1.773   289.554
 Qfileinfo      17163     9.917   232.687
 Qfsinfo        17903     2.130   216.804
 Sfileinfo       8828    17.427   234.069
 Find           37915     3.478   287.326
 WriteX         53503     0.048     2.992
 ReadX         169707     0.592   199.341
 LockX            350    13.536   242.800
 UnlockX          350     2.801   124.317
 Flush           7548    20.248   229.864

Throughput 5.61504 MB/sec  6 clients  6 procs  max_latency=406.225 ms

DBENCH RESULTS WITHOUT WORKAROUND
---------------------------------
...
  6    111273     5.06 MB/sec  execute 599 sec  latency 4066.072 ms
   6  cleanup 600 sec
   3  cleanup 601 sec
   0  cleanup 601 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      97674    12.244 13773.786
 Close          71583    19.126 13774.836
 Rename          4135    19.177   277.171
 Unlink         19881     4.286 12842.441
 Qpathinfo      88303     1.942 12923.151
 Qfileinfo      15305     9.636   203.481
 Qfsinfo        16305     1.801   227.405
 Sfileinfo       7960    15.871   186.799
 Find           34164     3.409   255.098
 WriteX         48105     0.053     5.428
 ReadX         152460     0.926 13759.913
 LockX            314     7.562    53.131
 UnlockX          314     1.847    47.083
 Flush           6872    19.222   200.180

Throughput 5.06232 MB/sec  6 clients  6 procs  max_latency=13774.850 ms

Then I tried to running also:

./nfstest_io -d /mnt/t/data -v all -n 10 -r 3600

with WORKAROUND it took:
INFO: 2018-11-28 16:31:25.031280  TIME:        449 secs

WITHOUT:
INFO: 2018-11-28 17:55:39.438688  TIME:       1348 secs

4. All of the above 'slowliness' disappeared when I re-run the same tests the
second time, since NFS had cached all locally in the first run probably.

5. Reboot hangs similarly.

The fact that the traffic stops without triggering any NFS timeo and retry makes
me thing that are the egressing NFS rpc requests themselves that got stuck
somehow (but I'm far from being an NFS expert)

Any idea or thoughts ? Additional sensible test cases to run ?
Or hint on where to look inside NFS code Trond ?
(I'll re-test with a newer 4.20 RC and trying to ftrace something...in the next
days)

Thanks

Cristian