Re: Reoccurring 5 second delays during NFS calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Benjamin,

here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.

The command I used was

touch test.txt && sleep 2 && touch test.txt

test.txt did not exist previously. So you have an example of a touch without and with delay.

Best regards,
Florian

Am 21.02.23 um 15:13 schrieb Benjamin Coddington:
On 21 Feb 2023, at 8:33, Florian Möller wrote:

Hi all,

unfortunately the patch did not help, neither using -o async nor using -o sync. We did some more tests (which is the reason for the delay of this reply):

We used a qemu image housing both the NFS server and the client and did some kernel debugging.
OS: Ubuntu 22.04.1
Kernel: 5.15.78
Mount line: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,
	namlen=255,hard,proto=tcp,timeo=600,retrans=2,
	sec=krb5p,clientaddr=10.0.0.1,
	local_lock=none,addr=10.0.0.1

We touched a file and then touched the file again. This triggers the delay reliably.

We set breakpoints on all functions starting with nfs4 and on "update_open_stateid". The attached file "1sttouch.log" contains a gdb log of the first touch.
"2ndtouch.log" shows the gdb output of the second touch. The delay occurs in line 116 in update_open_stateid.

We then deleted all breakpoints and set a sole breakpoint on update_open_stateid. We touched the file again and used only the "next" command of gdb. The gdb output is in "2ndtouch-next.log", the delay occurs in line 8.

Please let us know if you need more information or if you want us to perform further tests.

Best regards,
Florian Möller

Hi Florian,

The 5 second value and location of the delay is making me suspect something
is wrong with the open stateid sequence processing.

The client introduces 5-second delays in order to correctly order stateid
updates from the server.  Usually this happens because there are multiple
processes sending OPEN/CLOSE calls and the server processess them out of
order, or the client receives the responses out of order.

It would be helpful to have a network capture of the problem, along with the
matching output from these tracepoints on the client:

nfs4:nfs4_open_stateid_update
nfs4:nfs4_open_stateid_update_wait
nfs4:nfs4_close_stateid_update_wait
sunrpc:xs_stream_read_request
sunrpc:rpc_request
sunrpc:rpc_task_end

And these tracepoints on the server:

nfsd:nfsd_preprocess
sunrpc:svc_process

I'm interested in seeing how the client is processing the sequence numbers
of the open stateid, or if perhaps there's a delegation in play.

LMK if you need help with the tracepoints -- you can simply append those
lines into /sys/kernel/debug/tracing/set_event, then reproduce the problem.
The output of those tracepoints will be in /sys/kernel/debug/tracing/trace.

Ben

--
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596
   57 30.720428828 10.126.0.109 → 10.126.0.109 NFS 252 [TCP Previous segment not captured] V4 Call SEQUENCE
   58 30.720592390 10.126.0.109 → 10.126.0.109 NFS 220 [TCP ACKed unseen segment] V4 Reply (Call In 57) SEQUENCE
   59 30.720611655 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=186 Ack=153 Win=511 Len=0 TSval=173753135 TSecr=173753135
   61 52.473490272 10.126.0.109 → 10.126.0.109 NFS 320 V4 Call
   62 52.473606700 10.126.0.109 → 10.126.0.109 NFS 420 V4 Reply (Call In 61)
   63 52.473614923 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=438 Ack=505 Win=510 Len=0 TSval=173774888 TSecr=173774888
   64 52.477849250 10.126.0.109 → 10.126.0.109 NFS 432 V4 Call
   65 52.477981562 10.126.0.109 → 10.126.0.109 NFS 540 V4 Reply (Call In 64)
   66 52.481123231 10.126.0.109 → 10.126.0.109 NFS 364 V4 Call
   67 52.481255060 10.126.0.109 → 10.126.0.109 NFS 452 V4 Reply (Call In 66)
   68 52.481383173 10.126.0.109 → 10.126.0.109 NFS 316 V4 Call CLOSE StateID: 0xab41
   69 52.481425875 10.126.0.109 → 10.126.0.109 NFS 228 V4 Reply (Call In 68) SEQUENCE | PUTFH Status: NFS4ERR_STALE
   70 52.481774274 10.126.0.109 → 10.126.0.109 NFS 320 V4 Call
   71 52.482851688 10.126.0.109 → 10.126.0.109 NFS 420 V4 Reply (Call In 70)
   72 52.523997562 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=1598 Ack=1873 Win=512 Len=0 TSval=173774938 TSecr=173774897
   73 54.489060244 10.126.0.109 → 10.126.0.109 NFS 388 V4 Call
   74 54.489340744 10.126.0.109 → 10.126.0.109 NFS 504 V4 Reply (Call In 73)
   75 54.489364270 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=1918 Ack=2309 Win=509 Len=0 TSval=173776904 TSecr=173776903
   76 59.648437485 10.126.0.109 → 10.126.0.109 NFS 364 V4 Call
   77 59.648966709 10.126.0.109 → 10.126.0.109 NFS 452 V4 Reply (Call In 76)
   78 59.648991812 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=2214 Ack=2693 Win=509 Len=0 TSval=173782063 TSecr=173782063
   79 59.651032061 10.126.0.109 → 10.126.0.109 NFS 316 V4 Call CLOSE StateID: 0xa5b6
   80 59.651429583 10.126.0.109 → 10.126.0.109 NFS 228 V4 Reply (Call In 79) SEQUENCE | PUTFH Status: NFS4ERR_STALE
   81 59.691894428 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=2462 Ack=2853 Win=512 Len=0 TSval=173782106 TSecr=173782066
           touch-2476    [000] .....  1841.512484: rpc_request: task:129@4 nfsv4 GETATTR (sync)
            nfsd-893     [000] .....  1841.515022: svc_process: addr=10.126.0.109:978 xid=0xffd24ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.515108: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0xffd24ee9 copied=348 reclen=348 offset=348
           touch-2476    [000] .....  1841.519191: rpc_task_end: task:129@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:0-1906    [000] .....  1841.519289: rpc_request: task:130@4 nfsv4 OPEN (async)
            nfsd-893     [000] .....  1841.519381: svc_process: addr=10.126.0.109:978 xid=0x00d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.519482: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x00d34ee9 copied=468 reclen=468 offset=468
    kworker/u2:0-1906    [000] .....  1841.520684: rpc_task_end: task:130@4 flags=ASYNC|MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
           touch-2476    [000] .....  1841.522551: nfs4_open_stateid_update: error=0 (OK) fileid=00:2e:1048887 fhandle=0x5a835bd1 stateid=1:0x8fe5bdfd
           touch-2476    [000] .....  1841.522571: rpc_request: task:131@4 nfsv4 SETATTR (sync)
            nfsd-893     [000] .....  1841.522695: svc_process: addr=10.126.0.109:978 xid=0x01d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.522750: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x01d34ee9 copied=380 reclen=380 offset=380
           touch-2476    [000] .....  1841.522825: rpc_task_end: task:131@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:0-1906    [000] .....  1841.522842: rpc_request: task:132@2 nfsv4 CLOSE (async)
            nfsd-893     [000] .....  1841.522891: svc_process: addr=10.126.0.109:978 xid=0x02d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.522916: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x02d34ee9 copied=156 reclen=156 offset=156
    kworker/u2:0-1906    [000] .....  1841.522929: rpc_task_end: task:132@2 flags=ASYNC|MOVEABLE|DYNAMIC|SOFT|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=-116 action=rpc_exit_task [sunrpc]
            bash-2230    [000] .....  1841.523219: rpc_request: task:133@4 nfsv4 GETATTR (sync)
            nfsd-893     [000] .....  1841.524303: svc_process: addr=10.126.0.109:978 xid=0x03d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.524347: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x03d34ee9 copied=348 reclen=348 offset=348
            bash-2230    [000] .....  1841.524427: rpc_task_end: task:133@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:1-2480    [000] .....  1843.530173: rpc_request: task:134@4 nfsv4 OPEN_NOATTR (async)
            nfsd-893     [000] .....  1843.530670: svc_process: addr=10.126.0.109:978 xid=0x04d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1843.530874: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x04d34ee9 copied=432 reclen=432 offset=432
    kworker/u2:1-2480    [000] .....  1843.531142: rpc_task_end: task:134@4 flags=ASYNC|MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
           touch-2482    [000] .....  1843.531395: nfs4_open_stateid_update_wait: error=0 (OK) fileid=00:2e:1048887 fhandle=0x5a835bd1 stateid=2:0x8fe5bdfd
           touch-2482    [000] .....  1848.689634: nfs4_open_stateid_update: error=-11 (EAGAIN) fileid=00:2e:1048887 fhandle=0x5a835bd1 stateid=2:0x8fe5bdfd
           touch-2482    [000] .....  1848.689728: rpc_request: task:135@4 nfsv4 SETATTR (sync)
            nfsd-893     [000] .....  1848.690267: svc_process: addr=10.126.0.109:978 xid=0x05d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1848.690494: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x05d34ee9 copied=380 reclen=380 offset=380
           touch-2482    [000] .....  1848.690806: rpc_task_end: task:135@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:1-2480    [000] .....  1848.691627: rpc_request: task:136@2 nfsv4 CLOSE (async)
            nfsd-893     [000] .....  1848.692586: svc_process: addr=10.126.0.109:978 xid=0x06d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1848.692931: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x06d34ee9 copied=156 reclen=156 offset=156
    kworker/u2:1-2480    [000] .....  1848.693157: rpc_task_end: task:136@2 flags=ASYNC|MOVEABLE|DYNAMIC|SOFT|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=-116 action=rpc_exit_task [sunrpc]

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux