On 16 Jul 2024, at 10:37, Trond Myklebust wrote: > On Tue, 2024-07-16 at 09:44 -0400, Benjamin Coddington wrote: >> On 16 Jul 2024, at 9:23, Trond Myklebust wrote: >>> Hmm... Why isn't it sufficient to use smp_mb__after_atomic() here? >>> That's what clear_and_wake_up_bit() uses in this case. >> >> Great question, one that I can't answer with confidence (which is why >> I >> solicited advice under the RFC posting). >> >> I ended up following the guidance in the comment above wake_up_bit(), >> since >> we clear RPC_TASK_RUNNING non-atomically within the queue's >> spinlock. It >> states that we'd probably need smp_mb(). >> >> However - this race isn't to tk_runstate, its actually to >> waitqueue_active() >> being true/false. So - I believe unless we're checking the bit in >> the >> waiter's wait_bit_action function, we really only want to look at the >> race >> in the terms of making sure the waiter's setting of the >> wait_queue_head is >> visible after the waker tests_and_sets RPC_TASK_RUNNING. >> > > My point is that if we were to call > > clear_and_wake_up_bit(RPC_TASK_QUEUED, &task->tk_runstate); > > then that should be sufficient to resolve the race with > > status = out_of_line_wait_on_bit(&task->tk_runstate, > RPC_TASK_QUEUED, rpc_wait_bit_killable, > TASK_KILLABLE|TASK_FREEZABLE); > > So really, all we should need to do is to add in the > smp_mb__after_atomic(). This is consistent with the guidance in > Documentation/atomic_t.txt and Documentation/atomic_bitops.txt. Point taken, but isn't clear_bit_unlock() (via clear_and_wake_up_bit()) providing a release barrier that clear_bit() (via rpc_clear_running(), spin_unlock()) is not providing? I don't think what we're doing is the same, because we conditionally gate the waker on test_and_set_bit(RPC_TASK_RUNNING) (which provides the atomic barrier), but order doesn't matter yet since we can still be racing on the other CPU to set up the wait_bit_queue -- because /that/ is gated on RPC_TASK_QUEUED. So I think we need to have a barrier that pairs with set_current_state(). If we /were/ to use clear_and_wake_up_bit(RPC_TASK_QUEUED, &task->tk_runstate) in rpc_make_runnable, I think that would also fix the problem. I can run the same 50-hour test with smb_mb__after_atomic() here on the same ppc64le setup where the problem manifests and provide results here. It will take a couple of days. Ben