On Wed, 2013-01-09 at 12:55 -0500, Chris Perl wrote: +AD4- +AD4- Hrm. I guess I'm in over my head here. Apologoies if I'm just asking +AD4- +AD4- silly bumbling questions. You can start ignoring me at any time. :) +AD4- +AD4- I stared at the code for a while and more and now see why what I +AD4- outlined is not possible. Thanks for helping to clarify+ACE- +AD4- +AD4- I decided to pull your git repo and compile with HEAD at +AD4- 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 (linux-next as of this +AD4- morning). Using this kernel, I can no longer induce any hangs. +AD4- +AD4- Interestingly, I tried recompiling the CentOS 6.3 kernel with +AD4- both the original patch (v4) and the last patch you sent about fixing +AD4- priority queues. With both of those in place, I still run into a +AD4- problem. +AD4- +AD4- echo 0 +AD4- /proc/sys/sunrpc/rpc+AF8-debug after the hang shows (I left in the +AD4- previous additional prints and added printing of the tasks pointer +AD4- itself): +AD4- +AD4- +ADw-6+AD4-client: ffff88082896c200, xprt: ffff880829011000, snd+AF8-task: ffff880829a1aac0 +AD4- +ADw-6+AD4-client: ffff8808282b5600, xprt: ffff880829011000, snd+AF8-task: ffff880829a1aac0 +AD4- +ADw-6+AD4---task-- -pid- flgs status -client- --rqstp- -timeout ---ops-- +AD4- +ADw-6+AD4-ffff88082a463180 22007 0080 -11 ffff8808282b5600 (null) 0 ffffffffa027b7a0 nfsv3 ACCESS a:call+AF8-reserveresult q:xprt+AF8-sending +AD4- +ADw-6+AD4-client: ffff88082838cc00, xprt: ffff88082b7c5800, snd+AF8-task: (null) +AD4- +ADw-6+AD4-client: ffff8808283db400, xprt: ffff88082b7c5800, snd+AF8-task: (null) +AD4- +ADw-6+AD4-client: ffff8808283db200, xprt: ffff880829011000, snd+AF8-task: ffff880829a1aac0 +AD4- +AD4- Any thoughts about other patches that might affect this? Hmm... The only one that springs to mind is this one (see attachment) and then the 'connect' fixes that you helped us with previously. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust+AEA-netapp.com www.netapp.com
From c6567ed1402c55e19b012e66a8398baec2a726f3 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Fri, 4 Jan 2013 12:23:21 -0500 Subject: [PATCH] SUNRPC: Ensure that we free the rpc_task after cleanups are done This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Cc: Weston Andros Adamson <dros@xxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Cc: Bruce Fields <bfields@xxxxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx --- net/sunrpc/sched.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index d17a704..b4133bd 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -934,16 +934,35 @@ struct rpc_task *rpc_new_task(const struct rpc_task_setup *setup_data) return task; } +/* + * rpc_free_task - release rpc task and perform cleanups + * + * Note that we free up the rpc_task _after_ rpc_release_calldata() + * in order to work around a workqueue dependency issue. + * + * Tejun Heo states: + * "Workqueue currently considers two work items to be the same if they're + * on the same address and won't execute them concurrently - ie. it + * makes a work item which is queued again while being executed wait + * for the previous execution to complete. + * + * If a work function frees the work item, and then waits for an event + * which should be performed by another work item and *that* work item + * recycles the freed work item, it can create a false dependency loop. + * There really is no reliable way to detect this short of verifying + * every memory free." + * + */ static void rpc_free_task(struct rpc_task *task) { - const struct rpc_call_ops *tk_ops = task->tk_ops; - void *calldata = task->tk_calldata; + unsigned short tk_flags = task->tk_flags; + + rpc_release_calldata(task->tk_ops, task->tk_calldata); - if (task->tk_flags & RPC_TASK_DYNAMIC) { + if (tk_flags & RPC_TASK_DYNAMIC) { dprintk("RPC: %5u freeing task\n", task->tk_pid); mempool_free(task, rpc_task_mempool); } - rpc_release_calldata(tk_ops, calldata); } static void rpc_async_release(struct work_struct *work) -- 1.7.11.7