On Tue, 2013-01-08 at 13:40 -0500, Chris Perl wrote: +AD4- On Mon, Jan 07, 2013 at 05:00:47PM -0500, Chris Perl wrote: +AD4- Anyway, it appears that on mount the rpc+AF8-tasks tk+AF8-client member is NULL +AD4- and therefore the double dereference of task-+AD4-tk+AF8-xprt is what blew +AD4- things up. I ammended the patch for this +AFs-1+AF0- and am testing it +AD4- now. +AD4- +AD4- Thus far, I've still hit hangs, it just seems to take longer. I'll have +AD4- to dig in a bit further to see what's going on now. +AD4- +AD4- Is this CentOS 6.3 kernel this system too old for you guys to care? +AD4- I.e. should I spend time reporting digging into and reporting problems +AD4- for this system as well or you only care about the fedora system? My main interest is always the upstream (Linus) kernel, however the RPC client in the CentOS 6.3 kernel does actually contain a lot of code that was recently backported from upstream. As such, it is definitely of interest to figure out corner case bugs so that we can compare to upstream... +AD4- I'll report back again when I have further info and after testing the +AD4- fedora system. +AD4- +AD4- +AFs-1+AF0- linux-kernel-test.patch I've attached the latest copy of the patch (v4). In addition to the check for tk+AF8-client+ACEAPQ-NULL, it needed a couple of changes to deal with the RCU code. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust+AEA-netapp.com www.netapp.com
From 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Mon, 7 Jan 2013 14:30:46 -0500 Subject: [PATCH v4] SUNRPC: Ensure we release the socket write lock if the rpc_task exits early If the rpc_task exits while holding the socket write lock before it has allocated an rpc slot, then the usual mechanism for releasing the write lock in xprt_release() is defeated. The problem occurs if the call to xprt_lock_write() initially fails, so that the rpc_task is put on the xprt->sending wait queue. If the task exits after being assigned the lock by __xprt_lock_write_func, but before it has retried the call to xprt_lock_and_alloc_slot(), then it calls xprt_release() while holding the write lock, but will immediately exit due to the test for task->tk_rqstp != NULL. Reported-by: Chris Perl <chris.perl@xxxxxxxxx> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx [>= 3.1] --- net/sunrpc/sched.c | 3 +-- net/sunrpc/xprt.c | 12 ++++++++++-- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index b4133bd..bfa3171 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -972,8 +972,7 @@ static void rpc_async_release(struct work_struct *work) static void rpc_release_resources_task(struct rpc_task *task) { - if (task->tk_rqstp) - xprt_release(task); + xprt_release(task); if (task->tk_msg.rpc_cred) { put_rpccred(task->tk_msg.rpc_cred); task->tk_msg.rpc_cred = NULL; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index bd462a5..33811db 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1136,10 +1136,18 @@ static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt) void xprt_release(struct rpc_task *task) { struct rpc_xprt *xprt; - struct rpc_rqst *req; + struct rpc_rqst *req = task->tk_rqstp; - if (!(req = task->tk_rqstp)) + if (req == NULL) { + if (task->tk_client) { + rcu_read_lock(); + xprt = rcu_dereference(task->tk_client->cl_xprt); + if (xprt->snd_task == task) + xprt_release_write(xprt, task); + rcu_read_unlock(); + } return; + } xprt = req->rq_xprt; if (task->tk_ops->rpc_count_stats != NULL) -- 1.7.11.7