Re: Possible Race Condition on SIGKILL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2013-01-08 at 13:40 -0500, Chris Perl wrote:
+AD4- On Mon, Jan 07, 2013 at 05:00:47PM -0500, Chris Perl wrote:
+AD4- Anyway, it appears that on mount the rpc+AF8-tasks tk+AF8-client member is NULL
+AD4- and therefore the double dereference of task-+AD4-tk+AF8-xprt is what blew
+AD4- things up.  I ammended the patch for this +AFs-1+AF0- and am testing it
+AD4- now.
+AD4- 
+AD4- Thus far, I've still hit hangs, it just seems to take longer.  I'll have
+AD4- to dig in a bit further to see what's going on now.
+AD4- 
+AD4- Is this CentOS 6.3 kernel this system too old for you guys to care?
+AD4- I.e. should I spend time reporting digging into and reporting problems
+AD4- for this system as well or you only care about the fedora system?

My main interest is always the upstream (Linus) kernel, however the RPC
client in the CentOS 6.3 kernel does actually contain a lot of code that
was recently backported from upstream. As such, it is definitely of
interest to figure out corner case bugs so that we can compare to
upstream...

+AD4- I'll report back again when I have further info and after testing the
+AD4- fedora system.
+AD4- 
+AD4- +AFs-1+AF0- linux-kernel-test.patch

I've attached the latest copy of the patch (v4). In addition to the
check for tk+AF8-client+ACEAPQ-NULL, it needed a couple of changes to deal with
the RCU code.

Cheers
   Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
www.netapp.com
From 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Mon, 7 Jan 2013 14:30:46 -0500
Subject: [PATCH v4] SUNRPC: Ensure we release the socket write lock if the
 rpc_task exits early

If the rpc_task exits while holding the socket write lock before it has
allocated an rpc slot, then the usual mechanism for releasing the write
lock in xprt_release() is defeated.

The problem occurs if the call to xprt_lock_write() initially fails, so
that the rpc_task is put on the xprt->sending wait queue. If the task
exits after being assigned the lock by __xprt_lock_write_func, but
before it has retried the call to xprt_lock_and_alloc_slot(), then
it calls xprt_release() while holding the write lock, but will
immediately exit due to the test for task->tk_rqstp != NULL.

Reported-by: Chris Perl <chris.perl@xxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx [>= 3.1]
---
 net/sunrpc/sched.c |  3 +--
 net/sunrpc/xprt.c  | 12 ++++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index b4133bd..bfa3171 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -972,8 +972,7 @@ static void rpc_async_release(struct work_struct *work)
 
 static void rpc_release_resources_task(struct rpc_task *task)
 {
-	if (task->tk_rqstp)
-		xprt_release(task);
+	xprt_release(task);
 	if (task->tk_msg.rpc_cred) {
 		put_rpccred(task->tk_msg.rpc_cred);
 		task->tk_msg.rpc_cred = NULL;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index bd462a5..33811db 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1136,10 +1136,18 @@ static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)
 void xprt_release(struct rpc_task *task)
 {
 	struct rpc_xprt	*xprt;
-	struct rpc_rqst	*req;
+	struct rpc_rqst	*req = task->tk_rqstp;
 
-	if (!(req = task->tk_rqstp))
+	if (req == NULL) {
+		if (task->tk_client) {
+			rcu_read_lock();
+			xprt = rcu_dereference(task->tk_client->cl_xprt);
+			if (xprt->snd_task == task)
+				xprt_release_write(xprt, task);
+			rcu_read_unlock();
+		}
 		return;
+	}
 
 	xprt = req->rq_xprt;
 	if (task->tk_ops->rpc_count_stats != NULL)
-- 
1.7.11.7


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux