Re: [PATCH] SUNRPC: Fix a race to wake a sync task

Benjamin Coddington <bcodding@xxxxxxxxxx> · Tue, 16 Jul 2024 12:06:51 -0400

On 16 Jul 2024, at 10:37, Trond Myklebust wrote:

> On Tue, 2024-07-16 at 09:44 -0400, Benjamin Coddington wrote:
>> On 16 Jul 2024, at 9:23, Trond Myklebust wrote:
>>> Hmm... Why isn't it sufficient to use smp_mb__after_atomic() here?
>>> That's what clear_and_wake_up_bit() uses in this case.
>>
>> Great question, one that I can't answer with confidence (which is why
>> I
>> solicited advice under the RFC posting).
>>
>> I ended up following the guidance in the comment above wake_up_bit(),
>> since
>> we clear RPC_TASK_RUNNING non-atomically within the queue's
>> spinlock.  It
>> states that we'd probably need smp_mb().
>>
>> However - this race isn't to tk_runstate, its actually to
>> waitqueue_active()
>> being true/false.  So - I believe unless we're checking the bit in
>> the
>> waiter's wait_bit_action function, we really only want to look at the
>> race
>> in the terms of making sure the waiter's setting of the
>> wait_queue_head is
>> visible after the waker tests_and_sets RPC_TASK_RUNNING.
>>
>
> My point is that if we were to call
>
> 	clear_and_wake_up_bit(RPC_TASK_QUEUED, &task->tk_runstate);
>
> then that should be sufficient to resolve the race with
>
> 	status = out_of_line_wait_on_bit(&task->tk_runstate,
>                                 RPC_TASK_QUEUED, rpc_wait_bit_killable,
>                                 TASK_KILLABLE|TASK_FREEZABLE);
>
> So really, all we should need to do is to add in the
> smp_mb__after_atomic(). This is consistent with the guidance in
> Documentation/atomic_t.txt and Documentation/atomic_bitops.txt.

Point taken, but isn't clear_bit_unlock() (via clear_and_wake_up_bit())
providing a release barrier that clear_bit() (via rpc_clear_running(),
spin_unlock()) is not providing?

I don't think what we're doing is the same, because we conditionally gate
the waker on test_and_set_bit(RPC_TASK_RUNNING) (which provides the atomic
barrier), but order doesn't matter yet since we can still be racing on the
other CPU to set up the wait_bit_queue -- because /that/ is gated on
RPC_TASK_QUEUED.  So I think we need to have a barrier that pairs with
set_current_state().

If we /were/ to use clear_and_wake_up_bit(RPC_TASK_QUEUED,
&task->tk_runstate) in rpc_make_runnable, I think that would also fix the
problem.

I can run the same 50-hour test with smb_mb__after_atomic() here on the
same ppc64le setup where the problem manifests and provide results here.  It
will take a couple of days.

Ben