Re: General protection fault in nfs4_setup_sequence caused by delegation return task

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dai,

On Sun, 2023-11-05 at 11:27 -0800, dai.ngo@xxxxxxxxxx wrote:
> Hi Trond,
> 
> When unmonting a NFS export, nfs_free_server is called. In
> nfs_free_server,
> we call rpc_shutdown_client(server->client) to kill all pending RPC
> tasks
> and wait for them to terminate before continue on to call
> nfs_put_client.
> In nfs_put_client, if the refcounf is drecemented to 0 then we call
> nfs_free_client which calls rpc_shutdown_client(clp->cl_rpcclient) to
> kill all pending RPC tasks that use nfs_client->cl_rpcclient to send
> the
> request.
> 
> Normally this works fine. However, due to some race conditions, if
> there are
> delegation return RPC tasks have not been executed yet when
> nfs_free_server
> is called then this can cause system to crash with general protection
> fault.
> 
> The conditions that can cause the crash are: (1) there are pending
> delegation
> return tasks called from nfs4_state_manager to return idle
> delegations and
> (2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
> RPC_AUTH_GSS_KRB5P
> and (3) the call to nfs_igrab_and_active, from
> _nfs4_proc_delegreturn, fails
> for any reasons and (4) there is a pending RPC task renewing the
> lease.
> 
> Since the delegation return tasks were called with 'issync = 0' the
> refcount on
> nfs_server were dropped (in nfs_client_return_marked_delegations
> after RPC task
> was submited to the RPC layer) and the nfs_igrab_and_active call
> fails, these
> RPC tasks do not hold any refcount on the nfs_server.
> 
> When nfs_free_server is called, rpc_shutdown_client(server->client)
> fails to
> kill these delegation return tasks since these tasks using
> nfs_client->cl_rpcclient
> to send the requests. When nfs_put_client is called, nfs_free_client
> is not
> called because there is a pending lease renew RPC task which uses
> nfs_client->cl_rpcclient
> to send the request and also adds a refcount on the nfs_client. This
> allows
> the delegation return tasks to stay alive and continue on after the
> nfs_server
> was freed.
> 
> I've seen the NFS client with 5.4 kernel crashes with this stack
> trace:
> 
> !# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
> ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
>   # 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
> ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
>   # 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
> ffffffffc05e33af net/sunrpc/sched.c:821:0
>   # 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at ffffffffc05eb527
> net/sunrpc/sched.c:925:0
>   # 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
> ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
>   # 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
> kernel/workqueue.c:2281:0
>   # 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
> kernel/workqueue.c:2427:0
>   # 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
> kernel/kthread.c:296:0
>   # 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
> arch/x86/entry/entry_64.S:355:0
>          
> Where the params of nfs4_setup_sequence:
>       client = (struct nfs_client *)0x4d54158ebc6cfc01
>       args = (struct nfs4_sequence_args *)0xffff998487f85800
>       res = (struct nfs4_sequence_res *)0xffff998487f85830
>       task = (struct rpc_task *)0xffff997d41da7d00
> 
> The 'client' pointer is invalid since it was extracted from d_data-
> >res.server->nfs_client
> and the nfs_server was freed.
> 
> I've reviewed the latest kernel 6.6-rc7, even though there are many
> changes
> since 5.4 I could not see any any changes to prevent this scenario to
> happen
> so I believe this problem still exists in 6.6-rc7.
> 
> I'd like to get your opinion on this potential issue with the latest
> kernel
> and if the problem still exists then what's the best way to fix it.
> 

nfs_inode_evict_delegation() should be calling
nfs_do_return_delegation() with the issync flag set, whereas
nfs_server_return_marked_delegations() should be holding a reference to
the inode before it calls nfs_end_delegation_return().

So where is this combination no inode reference + issync=0 originating
from?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux