Re: General protection fault in nfs4_setup_sequence caused by delegation return task

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 6 Nov 2023 18:44:26 +0000

On Mon, 2023-11-06 at 10:18 -0800, dai.ngo@xxxxxxxxxx wrote:
> 
> On 11/6/23 8:30 AM, Trond Myklebust wrote:
> > Hi Dai,
> > 
> > On Sun, 2023-11-05 at 11:27 -0800, dai.ngo@xxxxxxxxxx wrote:
> > > Hi Trond,
> > > 
> > > When unmonting a NFS export, nfs_free_server is called. In
> > > nfs_free_server,
> > > we call rpc_shutdown_client(server->client) to kill all pending
> > > RPC
> > > tasks
> > > and wait for them to terminate before continue on to call
> > > nfs_put_client.
> > > In nfs_put_client, if the refcounf is drecemented to 0 then we
> > > call
> > > nfs_free_client which calls rpc_shutdown_client(clp-
> > > >cl_rpcclient) to
> > > kill all pending RPC tasks that use nfs_client->cl_rpcclient to
> > > send
> > > the
> > > request.
> > > 
> > > Normally this works fine. However, due to some race conditions,
> > > if
> > > there are
> > > delegation return RPC tasks have not been executed yet when
> > > nfs_free_server
> > > is called then this can cause system to crash with general
> > > protection
> > > fault.
> > > 
> > > The conditions that can cause the crash are: (1) there are
> > > pending
> > > delegation
> > > return tasks called from nfs4_state_manager to return idle
> > > delegations and
> > > (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
> > > RPC_AUTH_GSS_KRB5P
> > > and (3) the call to nfs_igrab_and_active, from
> > > _nfs4_proc_delegreturn, fails
> > > for any reasons and (4) there is a pending RPC task renewing the
> > > lease.
> > > 
> > > Since the delegation return tasks were called with 'issync = 0'
> > > the
> > > refcount on
> > > nfs_server were dropped (in nfs_client_return_marked_delegations
> > > after RPC task
> > > was submited to the RPC layer) and the nfs_igrab_and_active call
> > > fails, these
> > > RPC tasks do not hold any refcount on the nfs_server.
> > > 
> > > When nfs_free_server is called, rpc_shutdown_client(server-
> > > >client)
> > > fails to
> > > kill these delegation return tasks since these tasks using
> > > nfs_client->cl_rpcclient
> > > to send the requests. When nfs_put_client is called,
> > > nfs_free_client
> > > is not
> > > called because there is a pending lease renew RPC task which uses
> > > nfs_client->cl_rpcclient
> > > to send the request and also adds a refcount on the nfs_client.
> > > This
> > > allows
> > > the delegation return tasks to stay alive and continue on after
> > > the
> > > nfs_server
> > > was freed.
> > > 
> > > I've seen the NFS client with 5.4 kernel crashes with this stack
> > > trace:
> > > 
> > > !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
> > > ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
> > >    # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
> > > ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
> > >    # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
> > > ffffffffc05e33af net/sunrpc/sched.c:821:0
> > >    # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at
> > > ffffffffc05eb527
> > > net/sunrpc/sched.c:925:0
> > >    # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
> > > ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
> > >    # 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
> > > kernel/workqueue.c:2281:0
> > >    # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
> > > kernel/workqueue.c:2427:0
> > >    # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
> > > kernel/kthread.c:296:0
> > >    # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
> > > arch/x86/entry/entry_64.S:355:0
> > >           
> > > Where the params of nfs4_setup_sequence:
> > >        client = (struct nfs_client *)0x4d54158ebc6cfc01
> > >        args = (struct nfs4_sequence_args *)0xffff998487f85800
> > >        res = (struct nfs4_sequence_res *)0xffff998487f85830
> > >        task = (struct rpc_task *)0xffff997d41da7d00
> > > 
> > > The 'client' pointer is invalid since it was extracted from
> > > d_data-
> > > > res.server->nfs_client
> > > and the nfs_server was freed.
> > > 
> > > I've reviewed the latest kernel 6.6-rc7, even though there are
> > > many
> > > changes
> > > since 5.4 I could not see any any changes to prevent this
> > > scenario to
> > > happen
> > > so I believe this problem still exists in 6.6-rc7.
> > > 
> > > I'd like to get your opinion on this potential issue with the
> > > latest
> > > kernel
> > > and if the problem still exists then what's the best way to fix
> > > it.
> > > 
> > nfs_inode_evict_delegation() should be calling
> > nfs_do_return_delegation() with the issync flag set, whereas
> > nfs_server_return_marked_delegations() should be holding a
> > reference to
> > the inode before it calls nfs_end_delegation_return().
> > 
> > So where is this combination no inode reference + issync=0
> > originating
> > from?
> 
> The inode reference was dropped in
> nfs_server_return_marked_delegations
> after returning from nfs_end_delegation_return.
> 
> I don't quite understand why do we continue on with the delegation
> return
> in _nfs4_proc_delegreturn after nfs_igrab_and_active fails and issync
> = 0?

I'm not seeing how that case can occur at all.

If nfs_server_return_marked_delegations() is holding a reference to the
superblock and to the inode across the call to
nfs_end_delegation_return(), then it should not be possible for the
call to nfs_igrab_and_active() to fail because none of the reference
counts it tries to bump can already be zero.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx