Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote:
> On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust
> <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote:
> > > Hi Trond,
> > > 
> > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@xxxxxxxxxx> wrote:
> > > > 
> > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > 
> > > > Commit 4dc73c679114 reintroduces the deadlock that was fixed by
> > > > commit
> > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > > > because
> > > > it
> > > > prevents the setup of new threads to handle reboot recovery,
> > > > while
> > > > the
> > > > older recovery thread is stuck returning delegations.
> > > 
> > > I'm seeing a possible deadlock with xfstests generic/472 on NFS
> > > v4.x
> > > after applying this patch. The test itself checks for various
> > > swapfile
> > > edge cases, so it seems likely something is going on there.
> > > 
> > > Let me know if you need more info
> > > Anna
> > > 
> > 
> > Did you turn off delegations on your server? If you don't, then
> > swap
> > will deadlock itself under various scenarios.
> 
> Is there documentation somewhere that says that delegations must be
> turned off on the server if NFS over swap is enabled?

I think the question is more generally "Is there documentation for NFS
swap?"

> 
> If the client can't handle delegations + swap, then shouldn't this be
> solved by (1) checking if we are in NFS over swap and then
> proactively
> setting 'dont want delegation' on open and/or (2) proactively return
> the delegation if received so that we don't get into the deadlock?

We could do that for NFSv4.1 and NFSv4.2, but the protocol will not
allow us to do that for NFSv4.0.

> 
> I think this is similar to Anna's. With this patch, I'm running into
> a
> problem running against an ONTAP server using xfstests (no problems
> without the patch). During the run two stuck threads are:
> [root@unknown000c291be8aa aglo]# cat /proc/3724/stack
> [<0>] nfs4_run_state_manager+0x1c0/0x1f8 [nfsv4]
> [<0>] kthread+0x100/0x110
> [<0>] ret_from_fork+0x10/0x20
> [root@unknown000c291be8aa aglo]# cat /proc/3725/stack
> [<0>] nfs_wait_bit_killable+0x1c/0x88 [nfs]
> [<0>] nfs4_wait_clnt_recover+0xb4/0xf0 [nfsv4]
> [<0>] nfs4_client_recover_expired_lease+0x34/0x88 [nfsv4]
> [<0>] _nfs4_do_open.isra.0+0x94/0x408 [nfsv4]
> [<0>] nfs4_do_open+0x9c/0x238 [nfsv4]
> [<0>] nfs4_atomic_open+0x100/0x118 [nfsv4]
> [<0>] nfs4_file_open+0x11c/0x240 [nfsv4]
> [<0>] do_dentry_open+0x140/0x528
> [<0>] vfs_open+0x30/0x38
> [<0>] do_open+0x14c/0x360
> [<0>] path_openat+0x104/0x250
> [<0>] do_filp_open+0x84/0x138
> [<0>] file_open_name+0x134/0x190
> [<0>] __do_sys_swapoff+0x58/0x6e8
> [<0>] __arm64_sys_swapoff+0x18/0x28
> [<0>] invoke_syscall.constprop.0+0x7c/0xd0
> [<0>] do_el0_svc+0xb4/0xd0
> [<0>] el0_svc+0x50/0x228
> [<0>] el0t_64_sync_handler+0x134/0x150
> [<0>] el0t_64_sync+0x17c/0x180

Oh crap... Yes, that is a bug. Can you please apply the attached patch
on top of the original, and see if that fixes the problem?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx


From ec89a837b71772feeb55cd9ece4e91900d4afa79 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
Date: Fri, 22 Sep 2023 15:00:08 -0400
Subject: [PATCH] fixup! NFSv4: Fix a state manager thread deadlock regression

Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
---
 fs/nfs/nfs4state.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5751a6886da4..9a5d911a7edc 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2762,17 +2762,17 @@ static int nfs4_run_state_manager(void *ptr)
 	nfs4_state_manager(clp);
 
 	if (test_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) &&
-	    !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state)) {
+	    !test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state)) {
 		wait_var_event_interruptible(&clp->cl_state,
 					     test_bit(NFS4CLNT_RUN_MANAGER,
 						      &clp->cl_state));
 		if (!atomic_read(&cl->cl_swapper))
 			clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state);
-		if (refcount_read(&clp->cl_count) > 1 && !signalled())
+		if (refcount_read(&clp->cl_count) > 1 && !signalled() &&
+		    !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state))
 			goto again;
 		/* Either no longer a swapper, or were signalled */
 		clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state);
-		nfs4_clear_state_manager_bit(clp);
 	}
 
 	if (refcount_read(&clp->cl_count) > 1 && !signalled() &&
-- 
2.41.0


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux