Re: [PATCH] SUNRPC: Fix a race to wake a sync task

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 16 Jul 2024 14:37:13 +0000

On Tue, 2024-07-16 at 09:44 -0400, Benjamin Coddington wrote:
> On 16 Jul 2024, at 9:23, Trond Myklebust wrote:
> 
> > On Fri, 2024-07-12 at 09:55 -0400, Benjamin Coddington wrote:
> > > We've observed NFS clients with sync tasks sleeping in
> > > __rpc_execute
> > > waiting on RPC_TASK_QUEUED that have not responded to a wake-up
> > > from
> > > rpc_make_runnable().  I suspect this problem usually goes
> > > unnoticed,
> > > because on a busy client the task will eventually be re-awoken by
> > > another
> > > task completion or xprt event.  However, if the state manager is
> > > draining
> > > the slot table, a sync task missing a wake-up can result in a
> > > hung
> > > client.
> > > 
> > > We've been able to prove that the waker in rpc_make_runnable()
> > > successfully
> > > calls wake_up_bit() (ie- there's no race to tk_runstate), but the
> > > wake_up_bit() call fails to wake the waiter.  I suspect the waker
> > > is
> > > missing the load of the bit's wait_queue_head, so
> > > waitqueue_active()
> > > is
> > > false.  There are some very helpful comments about this problem
> > > above
> > > wake_up_bit(), prepare_to_wait(), and waitqueue_active().
> > > 
> > > Fix this by inserting smp_mb() before the wake_up_bit(), which
> > > pairs
> > > with
> > > prepare_to_wait() calling set_current_state().
> > > 
> > > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> > > ---
> > >  net/sunrpc/sched.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > index 6debf4fd42d4..34b31be75497 100644
> > > --- a/net/sunrpc/sched.c
> > > +++ b/net/sunrpc/sched.c
> > > @@ -369,8 +369,11 @@ static void rpc_make_runnable(struct
> > > workqueue_struct *wq,
> > >  	if (RPC_IS_ASYNC(task)) {
> > >  		INIT_WORK(&task->u.tk_work, rpc_async_schedule);
> > >  		queue_work(wq, &task->u.tk_work);
> > > -	} else
> > > +	} else {
> > > +		/* paired with set_current_state() in
> > > prepare_to_wait */
> > > +		smp_mb();
> > 
> > Hmm... Why isn't it sufficient to use smp_mb__after_atomic() here?
> > That's what clear_and_wake_up_bit() uses in this case.
> 
> Great question, one that I can't answer with confidence (which is why
> I
> solicited advice under the RFC posting).
> 
> I ended up following the guidance in the comment above wake_up_bit(),
> since
> we clear RPC_TASK_RUNNING non-atomically within the queue's
> spinlock.  It
> states that we'd probably need smp_mb().
> 
> However - this race isn't to tk_runstate, its actually to
> waitqueue_active()
> being true/false.  So - I believe unless we're checking the bit in
> the
> waiter's wait_bit_action function, we really only want to look at the
> race
> in the terms of making sure the waiter's setting of the
> wait_queue_head is
> visible after the waker tests_and_sets RPC_TASK_RUNNING.
> 

My point is that if we were to call

	clear_and_wake_up_bit(RPC_TASK_QUEUED, &task->tk_runstate);

then that should be sufficient to resolve the race with

	status = out_of_line_wait_on_bit(&task->tk_runstate,
                                RPC_TASK_QUEUED, rpc_wait_bit_killable,
                                TASK_KILLABLE|TASK_FREEZABLE);

So really, all we should need to do is to add in the
smp_mb__after_atomic(). This is consistent with the guidance in
Documentation/atomic_t.txt and Documentation/atomic_bitops.txt.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx