On Wed, 2011-07-06 at 17:07 -0700, Ben Greear wrote: > On 07/06/2011 04:45 PM, Trond Myklebust wrote: > > On Wed, 2011-07-06 at 15:49 -0700, greearb@xxxxxxxxxxxxxxx wrote: > >> From: Ben Greear<greearb@xxxxxxxxxxxxxxx> > >> > >> The rpc_killall_tasks logic is not locked against > >> the work-queue thread, but it still directly modifies > >> function pointers and data in the task objects. > >> > >> This patch changes the killall-tasks logic to set a flag > >> that tells the work-queue thread to terminate the task > >> instead of directly calling the terminate logic. > >> > >> Signed-off-by: Ben Greear<greearb@xxxxxxxxxxxxxxx> > >> --- > >> > >> NOTE: This needs review, as I am still struggling to understand > >> the rpc code, and it's quite possible this patch either doesn't > >> fully fix the problem or actually causes other issues. That said, > >> my nfs stress test seems to run a bit more stable with this patch applied. > > > > Yes, but I don't see why you are adding a new flag, nor do I see why we > > want to keep checking for that flag in the rpc_execute() loop. > > rpc_killall_tasks() is not a frequent operation that we want to optimise > > for. > > I was hoping that if the killall logic never set anything that was also > set by the work-queue thread it would be lock-safe without needing > explicit locking. > > I was a bit concerned that my flags |= KILLME logic would potentially > over-write flags that were being simultaneously written elsewhere > (so maybe I'd have to add a completely new variable for that KILLME flag > to really be safe.) > > > > > How about the following instead? > > I think it still races..more comments below. > > > > > 8<---------------------------------------------------------------------------------- > > From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001 > > From: Trond Myklebust<Trond.Myklebust@xxxxxxxxxx> > > Date: Wed, 6 Jul 2011 19:44:52 -0400 > > Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks > > > > Since rpc_killall_tasks may modify the rpc_task's tk_action field > > without any locking, we need to be careful when dereferencing it. > > > + do_action = task->tk_callback; > > + task->tk_callback = NULL; > > + if (do_action == NULL) { > > I think the race still exists, though it would be harder to hit. > What if the killall logic sets task->tk_callback right after you assign do_action, but before > you set tk_callback to NULL? Or after you set tk_callback to NULL for > that matter. What if it does? The rpc call will continue to execute until it completes. rpc_killall_tasks is really only useful for signalling to tasks that are hanging on a completely unresponsive server that we want them to stop. The only case where we really care is in rpc_shutdown_client(), where we sleep and loop anyway. IOW: I really don't care about 'fixing' rpc_killall_tasks to perfection. All I care about is that it doesn't Oops. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html