Re: [PATCH 03/11] rcu/nocb: Invoke rcu_core() at the start of deoffloading

Valentin Schneider <Valentin.Schneider@xxxxxxx> · Thu, 14 Oct 2021 12:42:40 +0100

On 14/10/21 00:07, Boqun Feng wrote:
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index e38028d48648..b236271b9022 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
>>      unsigned long flags;
>>      struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
>>      struct rcu_node *rnp = rdp->mynode;
>> +	/*
>> +	 * On RT rcu_core() can be preempted when IRQs aren't disabled.
>> +	 * Therefore this function can race with concurrent NOCB (de-)offloading
>> +	 * on this CPU and the below condition must be considered volatile.
>> +	 * However if we race with:
>> +	 *
>> +	 * _ Offloading:   In the worst case we accelerate or process callbacks
>> +	 *                 concurrently with NOCB kthreads. We are guaranteed to
>> +	 *                 call rcu_nocb_lock() if that happens.
>
> If offloading races with rcu_core(), can the following happen?
>
>       <offload work>
>       rcu_nocb_rdp_offload():
>                                               rcu_core():
>                                                 ...
>                                                 rcu_nocb_lock_irqsave(); // no a lock
>         raw_spin_lock_irqsave(->nocb_lock);
>           rdp_offload_toggle():
>             <LOCKING | OFFLOADED set>
>                                                 if (!rcu_segcblist_restempty(...))
>                                                   rcu_accelerate_cbs_unlocked(...);
>                                                 rcu_nocb_unlock_irqrestore();
>                                                 // ^ a real unlock,
>                                                 // and will preempt_enable()
>           // offload continue with ->nocb_lock not held
>
> If this can happen, it means an unpaired preempt_enable() and an
> incorrect unlock. Thoughts? Maybe I'm missing something here?
>

As Frederic pointed out in an earlier thread [1], this can't happen because
we still have IRQs disabled, and the offloading process has to be processed
on the CPU being offloaded. IOW, in the above scenario, rcu_core() can't be
preempted by the rcu_nocb_rdp_offload() work until it re-enables IRQs at
rcu_nocb_unlock_irqrestore().

(hopefully Paul or Frederic will correct me if I've just spewed garbage)

[1]: https://lore.kernel.org/lkml/20210930105340.GA232241@lothringen/

> Regards,
> Boqun
>