Re: [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking

"Benjamin Coddington" <bcodding@xxxxxxxxxx> · Thu, 17 Nov 2016 16:24:59 -0500

On 15 Nov 2016, at 11:12, Benjamin Coddington wrote:

On 14 Nov 2016, at 15:18, Benjamin Coddington wrote:

On 14 Nov 2016, at 11:50, Benjamin Coddington wrote:

While walking the list of lock_states, keep the current 
nfs4_lock_state
referenced so that it isn't freed while checking.

Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
---
 fs/nfs/nfs4proc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7897826d7c51..9a1cb9e8c4fc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2564,12 +2564,15 @@ static void 
nfs41_check_delegation_stateid(struct nfs4_state *state)
 static int nfs41_check_expired_locks(struct nfs4_state *state)
 {
 	int status, ret = NFS_OK;
-	struct nfs4_lock_state *lsp;
+	struct nfs4_lock_state *lsp, *tmp;
 	struct nfs_server *server = NFS_SERVER(state->inode);

 	if (!test_bit(LK_STATE_IN_USE, &state->flags))
 		goto out;
-	list_for_each_entry(lsp, &state->lock_states, ls_locks) {
+	spin_lock(&state->state_lock);
+	list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) 
{
+		atomic_inc(&lsp->ls_count);
+		spin_unlock(&state->state_lock);
 		if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
 			struct rpc_cred *cred = lsp->ls_state->owner->so_cred;

@@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct 
nfs4_state *state)
 				break;
 			}
 		}
-	};
+		nfs4_put_lock_state(lsp);
+		spin_lock(&state->state_lock);
+	}
+	spin_unlock(&state->state_lock);
 out:
 	return ret;
 }

Actually, something else may be wrong here.. after several more hours 
of
testing with this and Trond's two fixes for CLOSE races this popped 
up:

I've been an idiot.  This patch is wrong in two ways:

1. list_for_each_entry_safe() won't keep the *next* entry from being 
freed
2. There shouldn't be any reason (I think) it can't hold the state 
lock over
   nfs41_test_and_free_expired_stateid(), so the much simpler fix is 
to hold it
   the entire time. (I expect the problem I saw with that the first 
time was the
   CLOSE/OPEN race recently fixed).

*ben goes off and tests what should be a simple fix again*

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7897826d7c51..34b36f148b65 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2569,6 +2569,7 @@ static int nfs41_check_expired_locks(struct 
nfs4_state *state)

        if (!test_bit(LK_STATE_IN_USE, &state->flags))
                goto out;
+       spin_lock(&state->state_lock);
        list_for_each_entry(lsp, &state->lock_states, ls_locks) {
                if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
                        struct rpc_cred *cred = 
lsp->ls_state->owner->so_cred;
@@ -2588,7 +2589,8 @@ static int nfs41_check_expired_locks(struct 
nfs4_state *state)
                                break;
                        }
                }
-       };
+       }
+       spin_unlock(&state->state_lock);
 out:
        return ret;
 }

No, it /does/ soft lockup this time:

crash> bt 2514
PID: 2514   TASK: ffff880131560000  CPU: 2   COMMAND: "::1-manager"
 #0 [ffffc90002a1b9d0] __schedule at ffffffff8182212e
 #1 [ffffc90002a1ba50] schedule at ffffffff818225d6
 #2 [ffffc90002a1ba68] rpc_wait_bit_killable at ffffffffa00d38f4 
[sunrpc]
 #3 [ffffc90002a1ba88] __wait_on_bit at ffffffff818229e1
 #4 [ffffc90002a1bac8] out_of_line_wait_on_bit at ffffffff81822b62
 #5 [ffffc90002a1bb40] __rpc_execute at ffffffffa00d477a [sunrpc]
 #6 [ffffc90002a1bba8] rpc_execute at ffffffffa00d6d41 [sunrpc]
 #7 [ffffc90002a1bbd8] rpc_run_task at ffffffffa00c7d99 [sunrpc]
 #8 [ffffc90002a1bc00] nfs4_call_sync_sequence at ffffffffa02d11e3 
[nfsv4]
 #9 [ffffc90002a1bc78] _nfs41_test_stateid at ffffffffa02d9941 [nfsv4]
#10 [ffffc90002a1bd30] nfs41_test_and_free_expired_stateid at 
ffffffffa02db9b9 [nfsv4]
#11 [ffffc90002a1bd90] nfs41_open_expired at ffffffffa02dbdd7 [nfsv4]
#12 [ffffc90002a1bdf0] nfs4_do_reclaim at ffffffffa02f057f [nfsv4]
#13 [ffffc90002a1be88] nfs4_run_state_manager at ffffffffa02f0f10 
[nfsv4]
#14 [ffffc90002a1bec8] kthread at ffffffff810c96a9
#15 [ffffc90002a1bf50] ret_from_fork at ffffffff81827615

crash> l *(0xffffffffa00d477a)
0xffffffffa00d477a is in __rpc_execute (net/sunrpc/sched.c:807).
802                     /* sync task: sleep here */
803                     dprintk("RPC: %5u sync task going to sleep\n", 
task->tk_pid);
804                     status = 
out_of_line_wait_on_bit(&task->tk_runstate,
805                                     RPC_TASK_QUEUED, 
rpc_wait_bit_killable,
806                                     TASK_KILLABLE);
807                     if (status == -ERESTARTSYS) {
808                             /*
809                              * When a sync task receives a signal, 
it exits with
810                              * -ERESTARTSYS. In order to catch any 
callbacks that
811                              * clean up after sleeping on some 
queue, we don't

Why it is queued here? Did the server return DELAY maybe?  
Investigations are
ongoing, advice welcomed.  I am going to resume email silence until I 
have
something worthwhile to contribute, rather than continue to reply to 
myself.

I feel I've wasted a lot of time learning something which should be 
obvious:
Don't hold a spinlock and then send a RPC, let alone several of them.

Softlockup happens when rpciod is spinning on the state_lock in 
locku_done,
and then xprtiod is scheduled to receive data on the same CPU.  That 
data
would be the replies to TEST_STATEID which would complete TEST_STATEID 
that
would release the lock.

So, please drop this patch.  I'll send another that takes a reference 
and
sets a flag on each lock state for testing.  Then the spin lock can be
dropped while TEST_STATEID and possible FREE_STATEID are sent.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html