RE: Trying to reduce NFSv4 timeouts to a few seconds on an established connection

Andrew Klaassen <andrew.klaassen@xxxxxxxxxxxxxx> · Thu, 26 Jan 2023 15:31:35 +0000

> From: Andrew Klaassen <andrew.klaassen@xxxxxxxxxxxxxx>
> Sent: Monday, January 23, 2023 11:31 AM
> 
> Hello,
> 
> There's a specific NFSv4 mount on a specific machine which we'd like to
> timeout and return an error after a few seconds if the server goes away.
> 
> I've confirmed the following on two different kernels, 4.18.0-
> 348.12.2.el8_5.x86_64 and 6.1.7-200.fc37.x86_64.
> 
> I've been able to get both autofs and the mount command to cooperate, so
> that the mount attempt fails after an arbitrary number of seconds.  This
> mount command, for example, will fail after 6 seconds, as expected based on
> the timeo=20,retrans=2,retry=0 options:
> 
> $ time sudo mount -t nfs4 -o
> rw,relatime,sync,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmin
> =0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=20,retran
> s=2,retry=0,sec=sys thor04:/mnt/thorfs04  /mnt/thor04
> mount.nfs4: Connection timed out
> 
> real    0m6.084s
> user    0m0.007s
> sys     0m0.015s
> 
> However, if the share is already mounted and the server goes away, the
> timeout is always 2 minutes plus the time I expect based on timeo and
> retrans.  In this case, 2 minutes and 6 seconds:
> 
> $ time ls /mnt/thor04
> ls: cannot access '/mnt/thor04': Connection timed out
> 
> real    2m6.025s
> user    0m0.003s
> sys     0m0.000s
> 
> Watching the outgoing packets in the second case, the pattern is always the
> same:
>  - 0.2 seconds between the first two, then doubling each time until the two
> minute mark is exceeded (so the last NFS packet, which is always the 11th
> packet, is sent around 1:45 after the first).
>  - Then some generic packets that start exactly-ish on the two minute mark, 1
> second between the first two, then doubling each time.  (By this time the
> NFS command has given up.)
> 
> 11:10:21.898305 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889483 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:22.105189 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889690 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:22.313290 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889898 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:22.721269 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834890306 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:23.569192 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834891154 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:25.233212 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834892818 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:28.497282 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834896082 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:35.025219 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834902610 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:10:48.337201 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834915922 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:11:14.449303 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834942034 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:12:08.721251 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834996306 ecr
> 1589769203], length 200: NFS request xid 3614904256 196 getattr fh 0,2/53
> 11:12:22.545394 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835010130 ecr 0,nop,wscale 7],
> length 0
> 11:12:23.570199 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835011155 ecr 0,nop,wscale 7],
> length 0
> 11:12:25.617284 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835013202 ecr 0,nop,wscale 7],
> length 0
> 11:12:29.649219 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835017234 ecr 0,nop,wscale 7],
> length 0
> 11:12:37.905274 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835025490 ecr 0,nop,wscale 7],
> length 0
> 11:12:54.289212 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835041874 ecr 0,nop,wscale 7],
> length 0
> 11:13:26.545304 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq 1375256951,
> win 64240, options [mss 1460,sackOK,TS val 835074130 ecr 0,nop,wscale 7],
> length 0
> 
> I tried changing tcp_retries2 as suggested in another thread from this list:
> 
> # echo 3 > /proc/sys/net/ipv4/tcp_retries2
> 
> ...but it made no difference on either kernel.  The 2 minute timeout also
> doesn't seem to match with what I'd calculate from the initial value of
> tcp_retries2, which should give a much higher timeout.
> 
> The only clue I've been able to find is in the retry=n entry in the NFS
> manpage:
> 
> " For TCP the default is 3 minutes, but system TCP connection timeouts will
> sometimes limit the timeout of each retransmission to around 2 minutes."
> 
> What I'm not able to make sense of:
>  - The retry option says that it applies to mount operations, not read/write
> operations.  However, in this case I'm seeing the 2 minute delay on
> read/write operations but *not* mount operations.
>  - A couple of hours of searching didn't lead me to any kernel settings that
> would result in a 2 minute timeout.
> 
> Does anyone have any clues about a) what's happening and b) how to get
> our desired behaviour of being able to control both mount and read/write
> timeouts down to a few seconds?
> 
> Thanks.

I thought that changing TCP_RTO_MAX in include/net/tcp.h from 120 to something smaller and recompiling the kernel would change the 2 minute timeout, but it had no effect.  I'm going to keep poking through the kernel code to see if there's a knob I can turn to change the 2 minute timeout, so that I can at least understand where it's coming from.

Any hints as to where I should be looking?

Andrew