Re: Issue with Race Condition on NFS4 with KRB

Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> · Wed, 22 Jun 2011 17:51:47 -0400

On Wed, 2011-06-22 at 12:18 -0700, Joshua Scoggins wrote: 
> According to the it guys they are running solaris 10 as the server platform.

Ok. That should not be subject to the race I was thinking of...

> On Wed, Jun 22, 2011 at 11:57 AM, Trond Myklebust
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> > On Wed, 2011-06-22 at 11:37 -0700, Joshua Scoggins wrote:
> >> Here are our mount options from auto.master
> >>
> >> /user -fstype=nfs4,sec=krb5p,noresvport,noatime
> >> /group -fstype=nfs4,sec=krb5p,noresvport,noatime
> >>
> >> As for the server, we don't control it. It's actually run by the
> >> campus wide it department we are just lab support for CS. I can
> >> potentially get the server information but I need to know what you want
> >> specifically as they're pretty paranoid about giving out information about
> >> their servers.
> >
> > I would just want to know _what_ server platform you are running
> > against. I know of at least one server bug that might explain what you
> > are seeing, and I'd like to eliminate that as a possibility.
> >
> > Trond
> >
> >> Joshua Scoggins
> >>
> >> On Wed, Jun 22, 2011 at 11:30 AM, Trond Myklebust
> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> > On Wed, 2011-06-22 at 11:21 -0700, Joshua Scoggins wrote:
> >> >> Hello,
> >> >>
> >> >> We are trying to update our linux images in our CS lab and have it a
> >> >> bit of an issue. We are
> >> >> using nfs to load user home folder. While testing the new image we
> >> >> found that the nfs4 module will
> >> >>  crash when using firefox 3.6.17 for an extended period of time. Some
> >> >> research via google yielded that
> >> >> it's a potential race condition specific to nfs with krb auth with
> >> >> newer kernels. Our old image doesn't have
> >> >> this issue and it seems that its due to it running a far older kernel version.
> >> >>
> >> >> We have two images and both are having this problem. One is running
> >> >> 2.6.39 and the other is 2.6.38.
> >> >> Here is what dmesg spit out from the machine running 2.6.39 on one occasion:
> >> >>
> >> >> [  678.632061] ------------[ cut here ]------------
> >> >> [  678.632068] WARNING: at net/sunrpc/clnt.c:1567 call_decode+0xb2/0x69c()
> >> >> [  678.632070] Hardware name: OptiPlex 755
> >> >> [  678.632072] Modules linked in: nvidia(P) scsi_wait_scan
> >> >> [  678.632078] Pid: 3882, comm: kworker/0:2 Tainted: P
> >> >> 2.6.39-gentoo-r1 #1
> >> >> [  678.632080] Call Trace:
> >> >> [  678.632086]  [<ffffffff81035b20>] warn_slowpath_common+0x80/0x98
> >> >> [  678.632091]  [<ffffffff8117231e>] ? nfs4_xdr_dec_readdir+0xba/0xba
> >> >> [  678.632094]  [<ffffffff81035b4d>] warn_slowpath_null+0x15/0x17
> >> >> [  678.632097]  [<ffffffff81426f48>] call_decode+0xb2/0x69c
> >> >> [  678.632101]  [<ffffffff8142d2b5>] __rpc_execute+0x78/0x24b
> >> >> [  678.632104]  [<ffffffff8142d4c9>] ? rpc_execute+0x41/0x41
> >> >> [  678.632107]  [<ffffffff8142d4d9>] rpc_async_schedule+0x10/0x12
> >> >> [  678.632111]  [<ffffffff8104a49d>] process_one_work+0x1d9/0x2e7
> >> >> [  678.632114]  [<ffffffff8104c402>] worker_thread+0x133/0x24f
> >> >> [  678.632118]  [<ffffffff8104c2cf>] ? manage_workers+0x18d/0x18d
> >> >> [  678.632121]  [<ffffffff8104f6a0>] kthread+0x7d/0x85
> >> >> [  678.632125]  [<ffffffff8145e314>] kernel_thread_helper+0x4/0x10
> >> >> [  678.632128]  [<ffffffff8104f623>] ? kthread_worker_fn+0x13a/0x13a
> >> >> [  678.632131]  [<ffffffff8145e310>] ? gs_change+0xb/0xb
> >> >> [  678.632133] ---[ end trace 6bfae002a63e020e ]---

Looking at the code, there is only one way I can see for that warning to
occur, and that is if we put the request back on the 'xprt->recv' list
after it has already received a reply from the server.

Can you reproduce the problem with the attached patch?

Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

From 7fffb6b479454560503ba3166151b501381f5a6d Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Wed, 22 Jun 2011 17:27:16 -0400
Subject: [PATCH] SUNRPC: Fix a potential race in between xprt_complete_rqst
 and xprt_transmit

In xprt_transmit, if the test for list_empty(&req->rq_list) is to remain
lockless, we need to test for whether or not req->rq_reply_bytes_recvd is
set (i.e.  we already have a reply) after that test.
The reason is that xprt_complete_rqst orders the list deletion and
the setting of the req->rq_reply_bytes_recvd.

By doing the test of req->rq_reply_bytes_recvd under the spinlock, we
avoid an extra smp_rmb().

Also ensure that we turn off autodisconnect whether or not the RPC request
expects a reply.

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
---
 net/sunrpc/xprt.c |   34 +++++++++++++++++++++-------------
 1 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ce5eb68..10e1f21 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -878,23 +878,31 @@ void xprt_transmit(struct rpc_task *task)
 
 	dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);
 
-	if (!req->rq_reply_bytes_recvd) {
-		if (list_empty(&req->rq_list) && rpc_reply_expected(task)) {
-			/*
-			 * Add to the list only if we're expecting a reply
-			 */
+	if (list_empty(&req->rq_list)) {
+		/*
+		 * Add to the list only if we're expecting a reply
+		 */
+		if (rpc_reply_expected(task)) {
 			spin_lock_bh(&xprt->transport_lock);
-			/* Update the softirq receive buffer */
-			memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
-					sizeof(req->rq_private_buf));
-			/* Add request to the receive list */
-			list_add_tail(&req->rq_list, &xprt->recv);
+			/* Don't put back on the list if we have a reply
+			 * We do this test under the spin lock to avoid
+			 * an extra smp_rmb() betweent the tests of
+			 * req->rq_list and req->rq_reply_bytes_recvd
+			 */
+			if (req->rq_reply_bytes_recvd == 0) {
+				/* Update the softirq receive buffer */
+				memcpy(&req->rq_private_buf, &req->rq_rcv_buf,
+						sizeof(req->rq_private_buf));
+				/* Add request to the receive list */
+				list_add_tail(&req->rq_list, &xprt->recv);
+			}
 			spin_unlock_bh(&xprt->transport_lock);
 			xprt_reset_majortimeo(req);
-			/* Turn off autodisconnect */
-			del_singleshot_timer_sync(&xprt->timer);
 		}
-	} else if (!req->rq_bytes_sent)
+		/* Turn off autodisconnect */
+		del_singleshot_timer_sync(&xprt->timer);
+	}
+	if (req->rq_reply_bytes_recvd != 0 && req->rq_bytes_sent == 0)
 		return;
 
 	req->rq_connect_cookie = xprt->connect_cookie;
-- 
1.7.5.4