RE: [RFC] sunrpc: Fix race between work-queue and rpc_killall_tasks.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Ben Greear [mailto:greearb@xxxxxxxxxxxxxxx]
> Sent: Friday, July 08, 2011 1:19 PM
> To: Myklebust, Trond
> Cc: linux-nfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [RFC] sunrpc: Fix race between work-queue and
> rpc_killall_tasks.
> 
> On 07/06/2011 04:45 PM, Trond Myklebust wrote:
> > On Wed, 2011-07-06 at 15:49 -0700, greearb@xxxxxxxxxxxxxxx wrote:
> >> From: Ben Greear<greearb@xxxxxxxxxxxxxxx>
> >>
> >> The rpc_killall_tasks logic is not locked against
> >> the work-queue thread, but it still directly modifies
> >> function pointers and data in the task objects.
> >>
> >> This patch changes the killall-tasks logic to set a flag
> >> that tells the work-queue thread to terminate the task
> >> instead of directly calling the terminate logic.
> >>
> >> Signed-off-by: Ben Greear<greearb@xxxxxxxxxxxxxxx>
> >> ---
> >>
> >> NOTE:  This needs review, as I am still struggling to understand
> >> the rpc code, and it's quite possible this patch either doesn't
> >> fully fix the problem or actually causes other issues.  That said,
> >> my nfs stress test seems to run a bit more stable with this patch
> applied.
> >
> > Yes, but I don't see why you are adding a new flag, nor do I see why
> we
> > want to keep checking for that flag in the rpc_execute() loop.
> > rpc_killall_tasks() is not a frequent operation that we want to
> optimise
> > for.
> >
> > How about the following instead?
> 
> Ok, I looked at your patch closer.  I think it can still cause
> bad race conditions.
> 
> For instance:
> 
> Assume that tk_callback is NULL at beginning of while loop in
> __rpc_execute,
> and tk_action is rpc_exit_task.
> 
> While do_action(task) is being called, tk_action is set to NULL in
> rpc_exit_task.
> 
> But, right after tk_action is set to NULL in rpc_exit_task, the
> rpc_killall_tasks
> method calls rpc_exit, which sets tk_action back to rpc_exit_task.
> 
> I believe this could cause the xprt_release(task) logic to be called in
> the
> work-queue's execution of rpc_exit_task due to tk_action != NULL when
> it should not be.

Why would this be a problem? xprt_release() can certainly be called multiple times on an rpc_task. Ditto rpbc_getport_done.

The only thing that is not re-entrant there is rpcb_map_release, which should only ever be called once whether or not something calls rpc_killall_tasks.

 
> I have no hard evidence this exact scenario is happening in my case,
> but I
> believe the code is still racy with your patch.
> 
> For that matter, is it safe to modify the flags in rpc_killall_tasks:
> 
> rovr->tk_flags |= RPC_TASK_KILLED;
> 
> Is that guaranteed to be atomic with any other modification of flags?

Task->tk_flags should never change after the rpc_task is set up. The only allowed change is the RPC_TASK_KILLED. We could convert that into an atomic bit in task->tk_runstate, but again, this isn't something that is likely to be responsible for the problem you are seeing.

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux