Re: Question on nfs40_discover_server_trunking.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2013-01-18 at 19:44 -0500, Trond Myklebust wrote:
> On Fri, 2013-01-18 at 18:14 -0500, Chuck Lever wrote:
> > On Jan 18, 2013, at 5:59 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote:
> > 
> > > On Fri, 2013-01-18 at 17:03 -0500, an unknown sender wrote:
> > >> On Fri, 2013-01-18 at 16:33 -0500, Chuck Lever wrote:
> > >>> On Jan 18, 2013, at 4:28 PM, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> > >>> 
> > >>>> Any chance the STALE_CLIENTID case needs a 'break'?
> > >>> 
> > >>> I don't think so.  LEASE_CONFIRM is set, and we want to wake the state renewal thread.
> > >>> 
> > >>>> 
> > >>>> Twice I've seen kernel crashes after the nfs40_walk_client_list
> > >>>> failed (though code comments say it should never fail).
> > >>> 
> > >>> nfs40_walk_client_list() is looking for an nfs_client that is supposed to already be in the nfs_client list.  If the search fails, that's a bug.
> > >>> 
> > >>> Eyeball the contents of your nfs_client list.  You should find an appropriate nfs_client in there, and then figure out why the search doesn't find it.
> > >> 
> > >> You have considered the fact that the call to
> > >> nfs4_proc_setclientid_confirm can potentially return
> > >> NFS4ERR_STALE_CLIENTID if the server rebooted while the client was
> > >> walking the list?
> > > 
> > > In fact, as far as I can see, the correct behaviour in
> > > nfs40_discover_server_trunking() should be to re-issue the setclientid
> > > call, and then walk the list again if nfs40_walk_client_list() returns
> > > NFS4ERR_STALE_CLIENTID.
> > 
> > When I wrote the server trunking detection logic, I think we hadn't clearly decided what needed to be done in the STALE_CLIENTID case.
> > 
> > > Something like the attached patch:
> > 
> > A couple of comments:
> > 
> >  o  nfs_get_client() already sticks the new client on the tail of the nfs_client list
> 
> OK. That allows us to get rid of 5-6 lines of code.
> 
> >  o  We don't want to get stuck in a loop here.  Should the "do {}" loop in nfs40_discover_server_trunking() be bounded by a retry count?
> 
> Not a number of retries, but possibly a timeout value.
> 
> > However, I haven't heard Ben say "oh, yes, my server had rebooted."  I'd like some confirmation that the match failed for an explainable and expected reason.
> 
> The spec allows the server to discard unconfirmed clientids, so our code
> should take that into account.
> 
> 
> BTW: I'm going insane trying to figure out how the refcounting in that
> code is supposed to work. The assumption that we can somehow safely
> return a non-referenced pointer from nfs4_discover_server_trunking and
> then bump the reference count in nfs4_init_client() drives me nuts...

Attached is v2 of the patch that addresses all of the above.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
From 5fdbe6129cc767b4d8926d4a53ce7ce075411156 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Fri, 18 Jan 2013 17:54:34 -0500
Subject: [PATCH v2] NFSv4: Fix NFSv4.0 trunking discovery

If walking the list in nfs40_walk_client_list fails, then the most
likely explanation is that the server rebooted. Ensure that we catch
that case by testing our new nfs_client last.

Also fix reference counting issues that could lead to a memory leak
for nfs_client structures.

Reported-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Cc: Chuck Lever <chuck.lever@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx [>=3.7]
---
 fs/nfs/nfs4client.c | 37 +++++++++++++++----------------------
 fs/nfs/nfs4state.c  | 22 ++++++++++------------
 2 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index acc3472..bd6efe8 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -234,13 +234,12 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
 		nfs_mark_client_ready(clp, NFS_CS_READY);
 
 	error = nfs4_discover_server_trunking(clp, &old);
+	nfs_put_client(clp);
 	if (error < 0)
 		goto error;
 	if (clp != old) {
 		clp->cl_preserve_clid = true;
-		nfs_put_client(clp);
 		clp = old;
-		atomic_inc(&clp->cl_count);
 	}
 
 	return clp;
@@ -332,40 +331,33 @@ int nfs40_walk_client_list(struct nfs_client *new,
 
 		if (prev)
 			nfs_put_client(prev);
+		prev = pos;
 
 		status = nfs4_proc_setclientid_confirm(pos, &clid, cred);
-		if (status == 0) {
+		switch (status) {
+		case -NFS4ERR_STALE_CLIENTID:
+			break;
+		case 0:
 			nfs4_swap_callback_idents(pos, new);
 
-			nfs_put_client(pos);
+			prev = NULL;
 			*result = pos;
 			dprintk("NFS: <-- %s using nfs_client = %p ({%d})\n",
 				__func__, pos, atomic_read(&pos->cl_count));
-			return 0;
-		}
-		if (status != -NFS4ERR_STALE_CLIENTID) {
-			nfs_put_client(pos);
-			dprintk("NFS: <-- %s status = %d, no result\n",
-				__func__, status);
-			return status;
+		default:
+			goto out;
 		}
 
 		spin_lock(&nn->nfs_client_lock);
-		prev = pos;
 	}
+	spin_unlock(&nn->nfs_client_lock);
 
-	/*
-	 * No matching nfs_client found.  This should be impossible,
-	 * because the new nfs_client has already been added to
-	 * nfs_client_list by nfs_get_client().
-	 *
-	 * Don't BUG(), since the caller is holding a mutex.
-	 */
+	/* No match found. The server lost our clientid */
+out:
 	if (prev)
 		nfs_put_client(prev);
-	spin_unlock(&nn->nfs_client_lock);
-	pr_err("NFS: %s Error: no matching nfs_client found\n", __func__);
-	return -NFS4ERR_STALE_CLIENTID;
+	dprintk("NFS: <-- %s status = %d\n", __func__, status);
+	return status;
 }
 
 #ifdef CONFIG_NFS_V4_1
@@ -473,6 +465,7 @@ int nfs41_walk_client_list(struct nfs_client *new,
 		if (!nfs4_match_serverowners(pos, new))
 			continue;
 
+		atomic_inc(&pos->cl_count);
 		spin_unlock(&nn->nfs_client_lock);
 		dprintk("NFS: <-- %s using nfs_client = %p ({%d})\n",
 			__func__, pos, atomic_read(&pos->cl_count));
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 9448c57..102885d 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -129,24 +129,22 @@ int nfs40_discover_server_trunking(struct nfs_client *clp,
 	if (clp->cl_addr.ss_family == AF_INET6)
 		port = nn->nfs_callback_tcpport6;
 
-	status = nfs4_proc_setclientid(clp, NFS4_CALLBACK, port, cred, &clid);
-	if (status != 0)
-		goto out;
-	clp->cl_clientid = clid.clientid;
-	clp->cl_confirm = clid.confirm;
+	do {
+		status = nfs4_proc_setclientid(clp, NFS4_CALLBACK, port, cred, &clid);
+		if (status != 0)
+			break;
+		clp->cl_clientid = clid.clientid;
+		clp->cl_confirm = clid.confirm;
 
-	status = nfs40_walk_client_list(clp, result, cred);
-	switch (status) {
-	case -NFS4ERR_STALE_CLIENTID:
-		set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
-	case 0:
+		status = nfs40_walk_client_list(clp, result, cred);
+	} while (status == -NFS4ERR_STALE_CLIENTID);
+
+	if (status == 0) {
 		/* Sustain the lease, even if it's empty.  If the clientid4
 		 * goes stale it's of no use for trunking discovery. */
 		nfs4_schedule_state_renewal(*result);
-		break;
 	}
 
-out:
 	return status;
 }
 
-- 
1.8.1


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux