Re: Trying to reduce NFSv4 timeouts to a few seconds on an established connection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2023-01-30 at 19:33 +0000, Andrew Klaassen wrote:
> > From: Jeff Layton <jlayton@xxxxxxxxxx>
> > Sent: Friday, January 27, 2023 8:33 AM
> > 
> > On Thu, 2023-01-26 at 22:08 +0000, Andrew Klaassen wrote:
> > > > From: Andrew Klaassen <andrew.klaassen@xxxxxxxxxxxxxx>
> > > > Sent: Thursday, January 26, 2023 10:32 AM
> > > > 
> > > > > From: Andrew Klaassen <andrew.klaassen@xxxxxxxxxxxxxx>
> > > > > Sent: Monday, January 23, 2023 11:31 AM
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > There's a specific NFSv4 mount on a specific machine which
> > > > > we'd
> > > > > like to timeout and return an error after a few seconds if the
> > > > > server goes away.
> > > > > 
> > > > > I've confirmed the following on two different kernels, 4.18.0-
> > > > > 348.12.2.el8_5.x86_64 and 6.1.7-200.fc37.x86_64.
> > > > > 
> > > > > I've been able to get both autofs and the mount command to
> > > > > cooperate, so that the mount attempt fails after an arbitrary
> > > > > number of seconds.
> > > > > This mount command, for example, will fail after 6 seconds, as
> > > > > expected based on the timeo=20,retrans=2,retry=0 options:
> > > > > 
> > > > > $ time sudo mount -t nfs4 -o
> > > > > rw,relatime,sync,vers=4.2,rsize=131072,wsize=131072,namlen=255
> > > > > ,acr
> > > > > egmi
> > > > > n
> > > > > 
> > > > =0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=20
> > > > ,ret
> > > > ra
> > > > > n s=2,retry=0,sec=sys thor04:/mnt/thorfs04  /mnt/thor04
> > > > > mount.nfs4: Connection timed out
> > > > > 
> > > > > real    0m6.084s
> > > > > user    0m0.007s
> > > > > sys     0m0.015s
> > > > > 
> > > > > However, if the share is already mounted and the server goes
> > > > > away,
> > > > > the timeout is always 2 minutes plus the time I expect based
> > > > > on
> > > > > timeo and retrans.  In this case, 2 minutes and 6 seconds:
> > > > > 
> > > > > $ time ls /mnt/thor04
> > > > > ls: cannot access '/mnt/thor04': Connection timed out
> > > > > 
> > > > > real    2m6.025s
> > > > > user    0m0.003s
> > > > > sys     0m0.000s
> > > > > 
> > > > > Watching the outgoing packets in the second case, the pattern
> > > > > is
> > > > > always the
> > > > > same:
> > > > >  - 0.2 seconds between the first two, then doubling each time
> > > > > until the two minute mark is exceeded (so the last NFS packet,
> > > > > which is always the 11th packet, is sent around 1:45 after the
> > > > > first).
> > > > >  - Then some generic packets that start exactly-ish on the two
> > > > > minute mark, 1 second between the first two, then doubling
> > > > > each
> > > > > time.
> > > > > (By
> > > > > this time the NFS command has given up.)
> > > > > 
> > > > > 11:10:21.898305 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834889483 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:22.105189 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834889690 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:22.313290 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834889898 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:22.721269 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834890306 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:23.569192 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834891154 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:25.233212 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834892818 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:28.497282 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834896082 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:35.025219 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834902610 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:10:48.337201 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834915922 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:11:14.449303 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834942034 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:12:08.721251 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags
> > > > > [P.],
> > > > > seq 14452:14652, ack 18561, win 501, options [nop,nop,TS val
> > > > > 834996306 ecr 1589769203], length 200: NFS request xid
> > > > > 3614904256
> > > > > 196 getattr fh
> > > > > 0,2/53
> > > > > 11:12:22.545394 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835010130 ecr 0,nop,wscale 7], length 0
> > > > > 11:12:23.570199 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835011155 ecr 0,nop,wscale 7], length 0
> > > > > 11:12:25.617284 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835013202 ecr 0,nop,wscale 7], length 0
> > > > > 11:12:29.649219 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835017234 ecr 0,nop,wscale 7], length 0
> > > > > 11:12:37.905274 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835025490 ecr 0,nop,wscale 7], length 0
> > > > > 11:12:54.289212 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835041874 ecr 0,nop,wscale 7], length 0
> > > > > 11:13:26.545304 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags
> > > > > [S],
> > > > > seq 1375256951, win 64240, options [mss 1460,sackOK,TS val
> > > > > 835074130 ecr 0,nop,wscale 7], length 0
> > > > > 
> > > > > I tried changing tcp_retries2 as suggested in another thread
> > > > > from
> > > > > this list:
> > > > > 
> > > > > # echo 3 > /proc/sys/net/ipv4/tcp_retries2
> > > > > 
> > > > > ...but it made no difference on either kernel.  The 2 minute
> > > > > timeout also doesn't seem to match with what I'd calculate
> > > > > from
> > > > > the initial value of tcp_retries2, which should give a much
> > > > > higher
> > > > > timeout.
> > > > > 
> > > > > The only clue I've been able to find is in the retry=n entry
> > > > > in
> > > > > the NFS
> > > > > manpage:
> > > > > 
> > > > > " For TCP the default is 3 minutes, but system TCP connection
> > > > > timeouts will sometimes limit the timeout of each
> > > > > retransmission
> > > > > to around
> > > > > 2
> > > > minutes."
> > > > > 
> > > > > What I'm not able to make sense of:
> > > > >  - The retry option says that it applies to mount operations,
> > > > > not
> > > > > read/write operations.  However, in this case I'm seeing the 2
> > > > > minute delay on read/write operations but *not* mount
> > > > > operations.
> > > > >  - A couple of hours of searching didn't lead me to any kernel
> > > > > settings that would result in a 2 minute timeout.
> > > > > 
> > > > > Does anyone have any clues about a) what's happening and b)
> > > > > how to
> > > > > get our desired behaviour of being able to control both mount
> > > > > and
> > > > > read/write timeouts down to a few seconds?
> > > > > 
> > > > > Thanks.
> > > > 
> > > > I thought that changing TCP_RTO_MAX in include/net/tcp.h from
> > > > 120 to
> > > > something smaller and recompiling the kernel would change the 2
> > > > minute timeout, but it had no effect.  I'm going to keep poking
> > > > through the kernel code to see if there's a knob I can turn to
> > > > change the 2 minute timeout, so that I can at least understand
> > > > where
> > > > it's coming from.
> > > > 
> > > > Any hints as to where I should be looking?
> > > 
> > > I believe I've made some progress with this today:
> > > 
> > >  - Calls to rpc_create() from fs/nfs/client.c are sending an
> > > rpc_timeout struct with their args.
> > >  - rpc_create() does *not* pass the timeout on to
> > > xprt_create_transport(), which then can't pass it on to
> > > xs_setup_tcp().
> > >  - xs_setup_tcp(), having no timeout passed to it, uses
> > > xs_tcp_default_timeout instead.
> > >  - changing xs_tcp_default_timeout changes the "ls" timeout
> > > behaviour
> > > I described above.
> > > 
> > > In theory all of this means that the timeout simply needs to be
> > > passed
> > > through and used instead of xs_tcp_default_timeout.  I'm going to
> > > give
> > > this a try tomorrow.
> > > 
> > 
> > That's a great root-cause analysis. The interlocking timeouts
> > involved with
> > NFS and its sockets can be really difficult to unwind.
> > 
> > Is there a way to automate this testcase? That might be nice to have
> > in
> > xfstests or the nfstest suite.
> > 
> > > Here's what I'm going to try first; I'm no C programmer, though,
> > > so
> > > any advice or corrections you might have would be appreciated.
> > > 
> > > Thanks.
> > > 
> > > Andrew
> > > 
> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index
> > > 0b0b9f1eed46..1350c1f489f7 100644
> > > --- a/net/sunrpc/clnt.c
> > > +++ b/net/sunrpc/clnt.c
> > > @@ -532,6 +532,7 @@ struct rpc_clnt *rpc_create(struct
> > > rpc_create_args
> > > *args)
> > >                 .addrlen = args->addrsize,
> > >                 .servername = args->servername,
> > >                 .bc_xprt = args->bc_xprt,
> > > +               .timeout = args->timeout,
> > >         };
> > >         char servername[48];
> > >         struct rpc_clnt *clnt;
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index
> > > aaa5b2741b79..adc79d94b59e 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -3003,7 +3003,7 @@ static struct rpc_xprt *xs_setup_tcp(struct
> > > xprt_create *args)
> > >         xprt->idle_timeout = XS_IDLE_DISC_TO;
> > > 
> > >         xprt->ops = &xs_tcp_ops;
> > > -       xprt->timeout = &xs_tcp_default_timeout;
> > > +       xprt->timeout = args->timeout;
> > > 
> > >         xprt->max_reconnect_timeout = xprt->timeout->to_maxval;
> > >         xprt->connect_timeout = xprt->timeout->to_initval *
> > > 
> > 
> > Looks like you're probably on the right track. You're missing a few
> > things:
> > 
> > You'll need to add a "timeout" field to struct xprt_create in
> > include/linux/sunrpc/xprt.h, and there may be some other places that
> > either
> > need to set the timeout in that structure, or do something with that
> > field
> > when it's set.
> > 
> > Once you have something that fixes your reproducer, go ahead and
> > post it
> > and we can help you work through whatever changes need to me made to
> > make it work.
> > 
> > Nice work!
> 
> Thanks for the tip, that was helpful.
> 
> Currently I'm fighting with kernel recompilation.  I decided to make
> it quicker by slimming down the config, but apparently I've achieved
> something which Google claims no one else has achieved:
> 
> Errors on kernel make modules_install:
> 
>   DEPMOD  /lib/modules/6.2.0-rc5-sunrpctimeo+
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs4_disable_idmapping
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs4_label_alloc
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> send_implementation_id
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs_atomic_open
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs_clear_verifier_delegated
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs4_client_id_uniquifier
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs4_dentry_operations
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs_fscache_open_file
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol nfs4_fs_type
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> recover_lost_locks
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs_callback_nr_threads
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> max_session_cb_slots
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> max_session_slots
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs_idmap_cache_timeout
> depmod: WARNING: /lib/modules/6.2.0-rc5-
> sunrpctimeo+/kernel/fs/nfs/nfsv4.ko needs unknown symbol
> nfs_callback_set_tcpport
> 
> Errors on module load:
> 
> [   94.008271] nfsv4: Unknown symbol nfs4_disable_idmapping (err -2)
> [   94.008321] nfsv4: Unknown symbol nfs4_label_alloc (err -2)
> [   94.008434] nfsv4: Unknown symbol send_implementation_id (err -2)
> [   94.008446] nfsv4: Unknown symbol nfs_atomic_open (err -2)
> [   94.008468] nfsv4: Unknown symbol nfs_clear_verifier_delegated (err
> -2)
> [   94.008475] nfsv4: Unknown symbol nfs4_client_id_uniquifier (err -
> 2)
> [   94.008501] nfsv4: Unknown symbol nfs4_dentry_operations (err -2)
> [   94.008521] nfsv4: Unknown symbol nfs_fscache_open_file (err -2)
> [   94.008566] nfsv4: Unknown symbol nfs4_fs_type (err -2)
> [   94.008595] nfsv4: Unknown symbol recover_lost_locks (err -2)
> [   94.008639] nfsv4: Unknown symbol nfs_callback_nr_threads (err -2)
> [   94.008654] nfsv4: Unknown symbol max_session_cb_slots (err -2)
> [   94.008678] nfsv4: Unknown symbol max_session_slots (err -2)
> [   94.008694] nfsv4: Unknown symbol nfs_idmap_cache_timeout (err -2)
> [   94.008709] nfsv4: Unknown symbol nfs_callback_set_tcpport (err -2)
> 
> I suspect I've turned something off in the config that I shouldn't
> have, but I'm not sure what.  I see that one of the symbols
> (nfs_clear_verifier_delegated) is in include/linux/nfs_fs.h, and the
> others are defined in fs/nfs/nfs4_fs.h, fs/nfs/super.c, fs/nfs/dir.c,
> fs/nfs/inode.c, fs/nfs/fscache.c, and fs/nfs/fs_context.c.  I'm
> changing config options and recompiling to try to figure out what I'm
> missing, but at a couple of hours per compile and only a couple of
> days a week to work on this it's slow going.  Any hints as to what I
> might be doing wrong would be appreciated.  😊
> 

Looks like the ABI got broken when you turned off some options.

Generally, if you just want to build a single module, then you want the
.config to be _exactly_ the one that you used to build the kernel you're
going to plug it into. Then to build the modules under fs/nfs you can
do:

    make modules_prepare
    make M=fs/nfs

...and then drop the resulting .ko objects into the right place in
/lib/modules.

That said, it may be simpler to just build and work with a whole kernel
for testing purposes. Working with an individual kmod can be a bit
tricky unless you know what you're doing.

Once you do the first, subsequent builds should be reasonably fast.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux