RE: Trying to reduce NFSv4 timeouts to a few seconds on an established connection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jeff Layton <jlayton@xxxxxxxxxx>
> Sent: Friday, January 27, 2023 8:33 AM
> 
> On Thu, 2023-01-26 at 22:08 +0000, Andrew Klaassen wrote:
> > > From: Andrew Klaassen <andrew.klaassen@xxxxxxxxxxxxxx>
> > > Sent: Thursday, January 26, 2023 10:32 AM
> > >
> > > > From: Andrew Klaassen <andrew.klaassen@xxxxxxxxxxxxxx>
> > > > Sent: Monday, January 23, 2023 11:31 AM
> > > >
> > > > Hello,
> > > >
> > > > There's a specific NFSv4 mount on a specific machine which we'd
> > > > like to timeout and return an error after a few seconds if the
> > > > server goes away.
> > > >
> > > > I've confirmed the following on two different kernels, 4.18.0-
> > > > 348.12.2.el8_5.x86_64 and 6.1.7-200.fc37.x86_64.
> > > >
> > > > I've been able to get both autofs and the mount command to
> > > > cooperate, so that the mount attempt fails after an arbitrary
> > > > number of seconds.
> > > > This mount command, for example, will fail after 6 seconds, as
> > > > expected based on the timeo=20,retrans=2,retry=0 options:
> > > >
> > > > $ time sudo mount -t nfs4 -o
> > > > rw,relatime,sync,vers=4.2,rsize=131072,wsize=131072,namlen=255,acr
> > > > egmi
> > > > n
> > > >
> > > =0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=20,ret
> > > ra
> > > > n s=2,retry=0,sec=sys thor04:/mnt/thorfs04  /mnt/thor04
> > > > mount.nfs4: Connection timed out
> > > >
> > > > real    0m6.084s
> > > > user    0m0.007s
> > > > sys     0m0.015s
> > > >
> > > > However, if the share is already mounted and the server goes away,
> > > > the timeout is always 2 minutes plus the time I expect based on
> > > > timeo and retrans.  In this case, 2 minutes and 6 seconds:
> > > >
> > > > $ time ls /mnt/thor04
> > > > ls: cannot access '/mnt/thor04': Connection timed out
> > > >
> > > > real    2m6.025s
> > > > user    0m0.003s
> > > > sys     0m0.000s
> > > >
> > > > Watching the outgoing packets in the second case, the pattern is
> > > > always the
> > > > same:
> > > >  - 0.2 seconds between the first two, then doubling each time
> > > > until the two minute mark is exceeded (so the last NFS packet,
> > > > which is always the 11th packet, is sent around 1:45 after the
> > > > first).
> > > >  - Then some generic packets that start exactly-ish on the two
> > > > minute mark, 1 second between the first two, then doubling each
> > > > time.
> > > > (By
> > > > this time the NFS command has given up.)
> > > >
> > > > 11:10:21.898305 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834889483 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:22.105189 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834889690 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:22.313290 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834889898 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:22.721269 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834890306 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:23.569192 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834891154 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:25.233212 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834892818 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:28.497282 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834896082 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:35.025219 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834902610 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:10:48.337201 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834915922 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:11:14.449303 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834942034 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:12:08.721251 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.],
> > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > 834996306 ecr 1589769203], length 200: NFS request xid 3614904256
> > > > 196 getattr fh
> > > > 0,2/53
> > > > 11:12:22.545394 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835010130 ecr 0,nop,wscale 7], length 0
> > > > 11:12:23.570199 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835011155 ecr 0,nop,wscale 7], length 0
> > > > 11:12:25.617284 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835013202 ecr 0,nop,wscale 7], length 0
> > > > 11:12:29.649219 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835017234 ecr 0,nop,wscale 7], length 0
> > > > 11:12:37.905274 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835025490 ecr 0,nop,wscale 7], length 0
> > > > 11:12:54.289212 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835041874 ecr 0,nop,wscale 7], length 0
> > > > 11:13:26.545304 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S],
> > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > 835074130 ecr 0,nop,wscale 7], length 0
> > > >
> > > > I tried changing tcp_retries2 as suggested in another thread from
> > > > this list:
> > > >
> > > > # echo 3 > /proc/sys/net/ipv4/tcp_retries2
> > > >
> > > > ...but it made no difference on either kernel.  The 2 minute
> > > > timeout also doesn't seem to match with what I'd calculate from
> > > > the initial value of tcp_retries2, which should give a much higher
> > > > timeout.
> > > >
> > > > The only clue I've been able to find is in the retry=n entry in
> > > > the NFS
> > > > manpage:
> > > >
> > > > " For TCP the default is 3 minutes, but system TCP connection
> > > > timeouts will sometimes limit the timeout of each retransmission
> > > > to around
> > > > 2
> > > minutes."
> > > >
> > > > What I'm not able to make sense of:
> > > >  - The retry option says that it applies to mount operations, not
> > > > read/write operations.  However, in this case I'm seeing the 2
> > > > minute delay on read/write operations but *not* mount operations.
> > > >  - A couple of hours of searching didn't lead me to any kernel
> > > > settings that would result in a 2 minute timeout.
> > > >
> > > > Does anyone have any clues about a) what's happening and b) how to
> > > > get our desired behaviour of being able to control both mount and
> > > > read/write timeouts down to a few seconds?
> > > >
> > > > Thanks.
> > >
> > > I thought that changing TCP_RTO_MAX in include/net/tcp.h from 120 to
> > > something smaller and recompiling the kernel would change the 2
> > > minute timeout, but it had no effect.  I'm going to keep poking
> > > through the kernel code to see if there's a knob I can turn to
> > > change the 2 minute timeout, so that I can at least understand where
> > > it's coming from.
> > >
> > > Any hints as to where I should be looking?
> >
> > I believe I've made some progress with this today:
> >
> >  - Calls to rpc_create() from fs/nfs/client.c are sending an
> > rpc_timeout struct with their args.
> >  - rpc_create() does *not* pass the timeout on to
> > xprt_create_transport(), which then can't pass it on to
> > xs_setup_tcp().
> >  - xs_setup_tcp(), having no timeout passed to it, uses
> > xs_tcp_default_timeout instead.
> >  - changing xs_tcp_default_timeout changes the "ls" timeout behaviour
> > I described above.
> >
> > In theory all of this means that the timeout simply needs to be passed
> > through and used instead of xs_tcp_default_timeout.  I'm going to give
> > this a try tomorrow.
> >
> 
> That's a great root-cause analysis. The interlocking timeouts involved with
> NFS and its sockets can be really difficult to unwind.
> 
> Is there a way to automate this testcase? That might be nice to have in
> xfstests or the nfstest suite.
> 
> > Here's what I'm going to try first; I'm no C programmer, though, so
> > any advice or corrections you might have would be appreciated.
> >
> > Thanks.
> >
> > Andrew
> >
> > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index
> > 0b0b9f1eed46..1350c1f489f7 100644
> > --- a/net/sunrpc/clnt.c
> > +++ b/net/sunrpc/clnt.c
> > @@ -532,6 +532,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args
> > *args)
> >                 .addrlen = args->addrsize,
> >                 .servername = args->servername,
> >                 .bc_xprt = args->bc_xprt,
> > +               .timeout = args->timeout,
> >         };
> >         char servername[48];
> >         struct rpc_clnt *clnt;
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index
> > aaa5b2741b79..adc79d94b59e 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -3003,7 +3003,7 @@ static struct rpc_xprt *xs_setup_tcp(struct
> > xprt_create *args)
> >         xprt->idle_timeout = XS_IDLE_DISC_TO;
> >
> >         xprt->ops = &xs_tcp_ops;
> > -       xprt->timeout = &xs_tcp_default_timeout;
> > +       xprt->timeout = args->timeout;
> >
> >         xprt->max_reconnect_timeout = xprt->timeout->to_maxval;
> >         xprt->connect_timeout = xprt->timeout->to_initval *
> >
> 
> Looks like you're probably on the right track. You're missing a few
> things:
> 
> You'll need to add a "timeout" field to struct xprt_create in
> include/linux/sunrpc/xprt.h, and there may be some other places that either
> need to set the timeout in that structure, or do something with that field
> when it's set.
> 
> Once you have something that fixes your reproducer, go ahead and post it
> and we can help you work through whatever changes need to me made to
> make it work.
> 
> Nice work!

Thanks for the tip, that was helpful.

Currently I'm fighting with kernel recompilation.  I decided to make it quicker by slimming down the config, but apparently I've achieved something which Google claims no one else has achieved:

Errors on kernel make modules_install:

  DEPMOD  /lib/modules/6.2.0-rc5-sunrpctimeo+
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs4_disable_idmapping
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs4_label_alloc
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol send_implementation_id
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs_atomic_open
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs_clear_verifier_delegated
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs4_client_id_uniquifier
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs4_dentry_operations
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs_fscache_open_file
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs4_fs_type
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol recover_lost_locks
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs_callback_nr_threads
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol max_session_cb_slots
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol max_session_slots
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs_idmap_cache_timeout
depmod: WARNING: /lib/modules/6.2.0-rc5-sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs_callback_set_tcpport

Errors on module load:

[   94.008271] nfsv4: Unknown symbol nfs4_disable_idmapping (err -2)
[   94.008321] nfsv4: Unknown symbol nfs4_label_alloc (err -2)
[   94.008434] nfsv4: Unknown symbol send_implementation_id (err -2)
[   94.008446] nfsv4: Unknown symbol nfs_atomic_open (err -2)
[   94.008468] nfsv4: Unknown symbol nfs_clear_verifier_delegated (err -2)
[   94.008475] nfsv4: Unknown symbol nfs4_client_id_uniquifier (err -2)
[   94.008501] nfsv4: Unknown symbol nfs4_dentry_operations (err -2)
[   94.008521] nfsv4: Unknown symbol nfs_fscache_open_file (err -2)
[   94.008566] nfsv4: Unknown symbol nfs4_fs_type (err -2)
[   94.008595] nfsv4: Unknown symbol recover_lost_locks (err -2)
[   94.008639] nfsv4: Unknown symbol nfs_callback_nr_threads (err -2)
[   94.008654] nfsv4: Unknown symbol max_session_cb_slots (err -2)
[   94.008678] nfsv4: Unknown symbol max_session_slots (err -2)
[   94.008694] nfsv4: Unknown symbol nfs_idmap_cache_timeout (err -2)
[   94.008709] nfsv4: Unknown symbol nfs_callback_set_tcpport (err -2)

I suspect I've turned something off in the config that I shouldn't have, but I'm not sure what.  I see that one of the symbols (nfs_clear_verifier_delegated) is in include/linux/nfs_fs.h, and the others are defined in fs/nfs/nfs4_fs.h, fs/nfs/super.c, fs/nfs/dir.c, fs/nfs/inode.c, fs/nfs/fscache.c, and fs/nfs/fs_context.c.  I'm changing config options and recompiling to try to figure out what I'm missing, but at a couple of hours per compile and only a couple of days a week to work on this it's slow going.  Any hints as to what I might be doing wrong would be appreciated.  😊

Andrew







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux