On Tue, 2024-07-16 at 09:44 -0400, Benjamin Coddington wrote: > On 16 Jul 2024, at 9:23, Trond Myklebust wrote: > > > On Fri, 2024-07-12 at 09:55 -0400, Benjamin Coddington wrote: > > > We've observed NFS clients with sync tasks sleeping in > > > __rpc_execute > > > waiting on RPC_TASK_QUEUED that have not responded to a wake-up > > > from > > > rpc_make_runnable(). I suspect this problem usually goes > > > unnoticed, > > > because on a busy client the task will eventually be re-awoken by > > > another > > > task completion or xprt event. However, if the state manager is > > > draining > > > the slot table, a sync task missing a wake-up can result in a > > > hung > > > client. > > > > > > We've been able to prove that the waker in rpc_make_runnable() > > > successfully > > > calls wake_up_bit() (ie- there's no race to tk_runstate), but the > > > wake_up_bit() call fails to wake the waiter. I suspect the waker > > > is > > > missing the load of the bit's wait_queue_head, so > > > waitqueue_active() > > > is > > > false. There are some very helpful comments about this problem > > > above > > > wake_up_bit(), prepare_to_wait(), and waitqueue_active(). > > > > > > Fix this by inserting smp_mb() before the wake_up_bit(), which > > > pairs > > > with > > > prepare_to_wait() calling set_current_state(). > > > > > > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > > > --- > > > net/sunrpc/sched.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > > index 6debf4fd42d4..34b31be75497 100644 > > > --- a/net/sunrpc/sched.c > > > +++ b/net/sunrpc/sched.c > > > @@ -369,8 +369,11 @@ static void rpc_make_runnable(struct > > > workqueue_struct *wq, > > > if (RPC_IS_ASYNC(task)) { > > > INIT_WORK(&task->u.tk_work, rpc_async_schedule); > > > queue_work(wq, &task->u.tk_work); > > > - } else > > > + } else { > > > + /* paired with set_current_state() in > > > prepare_to_wait */ > > > + smp_mb(); > > > > Hmm... Why isn't it sufficient to use smp_mb__after_atomic() here? > > That's what clear_and_wake_up_bit() uses in this case. > > Great question, one that I can't answer with confidence (which is why > I > solicited advice under the RFC posting). > > I ended up following the guidance in the comment above wake_up_bit(), > since > we clear RPC_TASK_RUNNING non-atomically within the queue's > spinlock. It > states that we'd probably need smp_mb(). > > However - this race isn't to tk_runstate, its actually to > waitqueue_active() > being true/false. So - I believe unless we're checking the bit in > the > waiter's wait_bit_action function, we really only want to look at the > race > in the terms of making sure the waiter's setting of the > wait_queue_head is > visible after the waker tests_and_sets RPC_TASK_RUNNING. > My point is that if we were to call clear_and_wake_up_bit(RPC_TASK_QUEUED, &task->tk_runstate); then that should be sufficient to resolve the race with status = out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_QUEUED, rpc_wait_bit_killable, TASK_KILLABLE|TASK_FREEZABLE); So really, all we should need to do is to add in the smp_mb__after_atomic(). This is consistent with the guidance in Documentation/atomic_t.txt and Documentation/atomic_bitops.txt. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx