Re: Possible Race Condition on SIGKILL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-01-07 at 15:29 -0500, Trond Myklebust wrote:
+AD4- On Mon, 2013-01-07 at 15:20 -0500, Chris Perl wrote:
+AD4- +AD4- On Mon, Jan 07, 2013 at 07:50:10PM +-0000, Myklebust, Trond wrote:
+AD4- +AD4- +AD4- Hi Chris,
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- Excellent sleuthing+ACE- Given the thoroughness of your explanation, I'm
+AD4- +AD4- +AD4- pretty sure that the attached patch should fix the problem.
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- Cheers
+AD4- +AD4- +AD4-   Trond
+AD4- +AD4- +AD4- -- 
+AD4- +AD4- +AD4- Trond Myklebust
+AD4- +AD4- +AD4- Linux NFS client maintainer
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- NetApp
+AD4- +AD4- +AD4- Trond.Myklebust+AEA-netapp.com
+AD4- +AD4- +AD4- www.netapp.com
+AD4- +AD4- 
+AD4- +AD4- +AD4- From ec8cbb4aff21cd0eac2c6f3fc4273ac72cdd91ef Mon Sep 17 00:00:00 2001
+AD4- +AD4- +AD4- From: Trond Myklebust +ADw-Trond.Myklebust+AEA-netapp.com+AD4-
+AD4- +AD4- +AD4- Date: Mon, 7 Jan 2013 14:30:46 -0500
+AD4- +AD4- +AD4- Subject: +AFs-PATCH+AF0- SUNRPC: Ensure we release the socket write lock if the
+AD4- +AD4- +AD4-  rpc+AF8-task exits early
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- If the rpc+AF8-task exits while holding the socket write lock before it has
+AD4- +AD4- +AD4- allocated an rpc slot, then the usual mechanism for releasing the write
+AD4- +AD4- +AD4- lock in xprt+AF8-release() is defeated.
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- The problem occurs if the call to xprt+AF8-lock+AF8-write() initially fails, so
+AD4- +AD4- +AD4- that the rpc+AF8-task is put on the xprt-+AD4-sending wait queue. If the task
+AD4- +AD4- +AD4- exits after being assigned the lock by +AF8AXw-xprt+AF8-lock+AF8-write+AF8-func, but
+AD4- +AD4- +AD4- before it has retried the call to xprt+AF8-lock+AF8-and+AF8-alloc+AF8-slot(), then
+AD4- +AD4- +AD4- it calls xprt+AF8-release() while holding the write lock, but will
+AD4- +AD4- +AD4- immediately exit due to the test for task-+AD4-tk+AF8-rqstp +ACEAPQ- NULL.
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- Reported-by: Chris Perl +ADw-chris.perl+AEA-gmail.com+AD4-
+AD4- +AD4- +AD4- Signed-off-by: Trond Myklebust +ADw-Trond.Myklebust+AEA-netapp.com+AD4-
+AD4- +AD4- +AD4- Cc: stable+AEA-vger.kernel.org +AFsAPgA9- 3.1+AF0-
+AD4- +AD4- +AD4- ---
+AD4- +AD4- +AD4-  net/sunrpc/xprt.c +AHw- 6 +-+-+-+---
+AD4- +AD4- +AD4-  1 file changed, 4 insertions(+-), 2 deletions(-)
+AD4- +AD4- +AD4- 
+AD4- +AD4- +AD4- diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
+AD4- +AD4- +AD4- index bd462a5..6676457 100644
+AD4- +AD4- +AD4- --- a/net/sunrpc/xprt.c
+AD4- +AD4- +AD4- +-+-+- b/net/sunrpc/xprt.c
+AD4- +AD4- +AD4- +AEAAQA- -1136,10 +-1136,12 +AEAAQA- static void xprt+AF8-request+AF8-init(struct rpc+AF8-task +ACo-task, struct rpc+AF8-xprt +ACo-xprt)
+AD4- +AD4- +AD4-  void xprt+AF8-release(struct rpc+AF8-task +ACo-task)
+AD4- +AD4- +AD4-  +AHs-
+AD4- +AD4- +AD4-  	struct rpc+AF8-xprt	+ACo-xprt+ADs-
+AD4- +AD4- +AD4- -	struct rpc+AF8-rqst	+ACo-req+ADs-
+AD4- +AD4- +AD4- +-	struct rpc+AF8-rqst	+ACo-req +AD0- task-+AD4-tk+AF8-rqstp+ADs-
+AD4- +AD4- +AD4-  
+AD4- +AD4- +AD4- -	if (+ACE-(req +AD0- task-+AD4-tk+AF8-rqstp))
+AD4- +AD4- +AD4- +-	if (req +AD0APQ- NULL) +AHs-
+AD4- +AD4- +AD4- +-		xprt+AF8-release+AF8-write(task-+AD4-tk+AF8-xprt, task)+ADs-
+AD4- +AD4- +AD4-  		return+ADs-
+AD4- +AD4- +AD4- +-	+AH0-
+AD4- +AD4- +AD4-  
+AD4- +AD4- +AD4-  	xprt +AD0- req-+AD4-rq+AF8-xprt+ADs-
+AD4- +AD4- +AD4-  	if (task-+AD4-tk+AF8-ops-+AD4-rpc+AF8-count+AF8-stats +ACEAPQ- NULL)
+AD4- +AD4- +AD4- -- 
+AD4- +AD4- +AD4- 1.7.11.7
+AD4- +AD4- +AD4- 
+AD4- +AD4- 
+AD4- +AD4- Ah, I totally missed the call to +AGA-rpc+AF8-release+AF8-task' at the bottom of the
+AD4- +AD4- +AGAAXwBf-rpc+AF8-execute' loop (at least thats how I think we'd get to this function
+AD4- +AD4- you're patching).
+AD4- +AD4- 
+AD4- +AD4- But wouldn't we need to update the call site in
+AD4- +AD4- +AGA-rpc+AF8-release+AF8-resources+AF8-task' as well?  It contains an explicit check for
+AD4- +AD4- +AGA-task-+AD4-tk+AF8-rqstp' being non null.
+AD4- 
+AD4- Ewww.... You're right: That's a wart that seems to have been copied and
+AD4- pasted quite a few times.
+AD4- 
+AD4- Here is v2...

...and a v3 that adds a small optimisation to avoid taking the transport
lock in cases where we really don't need it.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
www.netapp.com
From 51b63a538c54cb9c3b83c4d62572cf18da165cba Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Mon, 7 Jan 2013 14:30:46 -0500
Subject: [PATCH v3] SUNRPC: Ensure we release the socket write lock if the
 rpc_task exits early

If the rpc_task exits while holding the socket write lock before it has
allocated an rpc slot, then the usual mechanism for releasing the write
lock in xprt_release() is defeated.

The problem occurs if the call to xprt_lock_write() initially fails, so
that the rpc_task is put on the xprt->sending wait queue. If the task
exits after being assigned the lock by __xprt_lock_write_func, but
before it has retried the call to xprt_lock_and_alloc_slot(), then
it calls xprt_release() while holding the write lock, but will
immediately exit due to the test for task->tk_rqstp != NULL.

Reported-by: Chris Perl <chris.perl@xxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx [>= 3.1]
---
 net/sunrpc/sched.c | 3 +--
 net/sunrpc/xprt.c  | 8 ++++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index b4133bd..bfa3171 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -972,8 +972,7 @@ static void rpc_async_release(struct work_struct *work)
 
 static void rpc_release_resources_task(struct rpc_task *task)
 {
-	if (task->tk_rqstp)
-		xprt_release(task);
+	xprt_release(task);
 	if (task->tk_msg.rpc_cred) {
 		put_rpccred(task->tk_msg.rpc_cred);
 		task->tk_msg.rpc_cred = NULL;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index bd462a5..6acc0c5 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1136,10 +1136,14 @@ static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)
 void xprt_release(struct rpc_task *task)
 {
 	struct rpc_xprt	*xprt;
-	struct rpc_rqst	*req;
+	struct rpc_rqst	*req = task->tk_rqstp;
 
-	if (!(req = task->tk_rqstp))
+	if (req == NULL) {
+		xprt = task->tk_xprt;
+		if (xprt->snd_task == task)
+			xprt_release_write(xprt, task);
 		return;
+	}
 
 	xprt = req->rq_xprt;
 	if (task->tk_ops->rpc_count_stats != NULL)
-- 
1.7.11.7


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux