On 11/6/23 10:44 AM, Trond Myklebust wrote:
On Mon, 2023-11-06 at 10:18 -0800, dai.ngo@xxxxxxxxxx wrote:
On 11/6/23 8:30 AM, Trond Myklebust wrote:
Hi Dai,
On Sun, 2023-11-05 at 11:27 -0800, dai.ngo@xxxxxxxxxx wrote:
Hi Trond,
When unmonting a NFS export, nfs_free_server is called. In
nfs_free_server,
we call rpc_shutdown_client(server->client) to kill all pending
RPC
tasks
and wait for them to terminate before continue on to call
nfs_put_client.
In nfs_put_client, if the refcounf is drecemented to 0 then we
call
nfs_free_client which calls rpc_shutdown_client(clp-
cl_rpcclient) to
kill all pending RPC tasks that use nfs_client->cl_rpcclient to
send
the
request.
Normally this works fine. However, due to some race conditions,
if
there are
delegation return RPC tasks have not been executed yet when
nfs_free_server
is called then this can cause system to crash with general
protection
fault.
The conditions that can cause the crash are: (1) there are
pending
delegation
return tasks called from nfs4_state_manager to return idle
delegations and
(2) the nfs_client's au_flavor is either RPC_AUTH_GSS_KRB5I or
RPC_AUTH_GSS_KRB5P
and (3) the call to nfs_igrab_and_active, from
_nfs4_proc_delegreturn, fails
for any reasons and (4) there is a pending RPC task renewing the
lease.
Since the delegation return tasks were called with 'issync = 0'
the
refcount on
nfs_server were dropped (in nfs_client_return_marked_delegations
after RPC task
was submited to the RPC layer) and the nfs_igrab_and_active call
fails, these
RPC tasks do not hold any refcount on the nfs_server.
When nfs_free_server is called, rpc_shutdown_client(server-
client)
fails to
kill these delegation return tasks since these tasks using
nfs_client->cl_rpcclient
to send the requests. When nfs_put_client is called,
nfs_free_client
is not
called because there is a pending lease renew RPC task which uses
nfs_client->cl_rpcclient
to send the request and also adds a refcount on the nfs_client.
This
allows
the delegation return tasks to stay alive and continue on after
the
nfs_server
was freed.
I've seen the NFS client with 5.4 kernel crashes with this stack
trace:
!# 0 [ffffb93b8fbdbd78] nfs4_setup_sequence [nfsv4] at
ffffffffc0f27e40 fs/nfs/nfs4proc.c:1041:0
# 1 [ffffb93b8fbdbdb8] nfs4_delegreturn_prepare [nfsv4] at
ffffffffc0f28ad1 fs/nfs/nfs4proc.c:6355:0
# 2 [ffffb93b8fbdbdd8] rpc_prepare_task [sunrpc] at
ffffffffc05e33af net/sunrpc/sched.c:821:0
# 3 [ffffb93b8fbdbde8] __rpc_execute [sunrpc] at
ffffffffc05eb527
net/sunrpc/sched.c:925:0
# 4 [ffffb93b8fbdbe48] rpc_async_schedule [sunrpc] at
ffffffffc05eb8e0 net/sunrpc/sched.c:1013:0
# 5 [ffffb93b8fbdbe68] process_one_work at ffffffff92ad4289
kernel/workqueue.c:2281:0
# 6 [ffffb93b8fbdbeb0] worker_thread at ffffffff92ad50cf
kernel/workqueue.c:2427:0
# 7 [ffffb93b8fbdbf10] kthread at ffffffff92adac05
kernel/kthread.c:296:0
# 8 [ffffb93b8fbdbf58] ret_from_fork at ffffffff93600364
arch/x86/entry/entry_64.S:355:0
Where the params of nfs4_setup_sequence:
client = (struct nfs_client *)0x4d54158ebc6cfc01
args = (struct nfs4_sequence_args *)0xffff998487f85800
res = (struct nfs4_sequence_res *)0xffff998487f85830
task = (struct rpc_task *)0xffff997d41da7d00
The 'client' pointer is invalid since it was extracted from
d_data-
res.server->nfs_client
and the nfs_server was freed.
I've reviewed the latest kernel 6.6-rc7, even though there are
many
changes
since 5.4 I could not see any any changes to prevent this
scenario to
happen
so I believe this problem still exists in 6.6-rc7.
I'd like to get your opinion on this potential issue with the
latest
kernel
and if the problem still exists then what's the best way to fix
it.
nfs_inode_evict_delegation() should be calling
nfs_do_return_delegation() with the issync flag set, whereas
nfs_server_return_marked_delegations() should be holding a
reference to
the inode before it calls nfs_end_delegation_return().
So where is this combination no inode reference + issync=0
originating
from?
The inode reference was dropped in
nfs_server_return_marked_delegations
after returning from nfs_end_delegation_return.
I don't quite understand why do we continue on with the delegation
return
in _nfs4_proc_delegreturn after nfs_igrab_and_active fails and issync
= 0?
I'm not seeing how that case can occur at all.
Since this crash is almost reproducible with 5.4 under a test setup,
I can add a patch to crash the system when nfs_igrab_and_active fails
and issync = 0.
If nfs_server_return_marked_delegations() is holding a reference to the
superblock and to the inode across the call to
nfs_end_delegation_return(), then it should not be possible for the
call to nfs_igrab_and_active() to fail
Yes, this is where I'm struggle to find out how can nfs_igrab_and_active
fails while we already had a refcount on the nfs_server. There must be
a scenario that causes nfs_igrab_and_active to fail since the
nfs4_delegreturndata->inode is NULL in the core dump.
Is it possible that the igrab in nfs_igrab_and_active fails, the inode
was marked with I_FREEING | I_WILL_FREE?
-Dai
because none of the reference
counts it tries to bump can already be zero.