Hi Frederic, On Mon, Oct 11, 2021 at 04:51:32PM +0200, Frederic Weisbecker wrote: > On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process, > some work, such as callbacks acceleration and invocation, may be left > unattended due to the volatile checks on the offloaded state. > > In the worst case this work is postponed until the next rcu_pending() > check that can take a jiffy to reach, which can be a problem in case > of callbacks flooding. > > Solve that with invoking rcu_core() early in the de-offloading process. > This way any work dismissed by an ongoing rcu_core() call fooled by > a preempting deoffloading process will be caught up by a nearby future > recall to rcu_core(), this time fully aware of the de-offloading state. > > Tested-by: Valentin Schneider <valentin.schneider@xxxxxxx> > Tested-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> > Cc: Valentin Schneider <valentin.schneider@xxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > Cc: Boqun Feng <boqun.feng@xxxxxxxxx> > Cc: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> > Cc: Uladzislau Rezki <urezki@xxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > --- > include/linux/rcu_segcblist.h | 14 ++++++++++++++ > kernel/rcu/rcu_segcblist.c | 6 ++---- > kernel/rcu/tree.c | 17 +++++++++++++++++ > kernel/rcu/tree_nocb.h | 9 +++++++++ > 4 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h > index 812961b1d064..659d13a7ddaa 100644 > --- a/include/linux/rcu_segcblist.h > +++ b/include/linux/rcu_segcblist.h > @@ -136,6 +136,20 @@ struct rcu_cblist { > * |--------------------------------------------------------------------------| > * | SEGCBLIST_RCU_CORE | | > * | SEGCBLIST_LOCKING | | > + * | SEGCBLIST_OFFLOADED | | > + * | SEGCBLIST_KTHREAD_CB | | > + * | SEGCBLIST_KTHREAD_GP | > + * | | > + * | CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core() | > + * | handles callbacks concurrently. Bypass enqueue is enabled. | > + * | Invoke RCU core so we make sure not to preempt it in the middle with | > + * | leaving some urgent work unattended within a jiffy. | > + * ---------------------------------------------------------------------------- > + * | > + * v > + * |--------------------------------------------------------------------------| > + * | SEGCBLIST_RCU_CORE | | > + * | SEGCBLIST_LOCKING | | > * | SEGCBLIST_KTHREAD_CB | | > * | SEGCBLIST_KTHREAD_GP | > * | | > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c > index c07aab6e39ef..81145c3ece25 100644 > --- a/kernel/rcu/rcu_segcblist.c > +++ b/kernel/rcu/rcu_segcblist.c > @@ -265,12 +265,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp) > */ > void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload) > { > - if (offload) { > + if (offload) > rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED); > - } else { > - rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE); > + else > rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED); > - } > } > > /* > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e38028d48648..b236271b9022 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void) > unsigned long flags; > struct rcu_data *rdp = raw_cpu_ptr(&rcu_data); > struct rcu_node *rnp = rdp->mynode; > + /* > + * On RT rcu_core() can be preempted when IRQs aren't disabled. > + * Therefore this function can race with concurrent NOCB (de-)offloading > + * on this CPU and the below condition must be considered volatile. > + * However if we race with: > + * > + * _ Offloading: In the worst case we accelerate or process callbacks > + * concurrently with NOCB kthreads. We are guaranteed to > + * call rcu_nocb_lock() if that happens. If offloading races with rcu_core(), can the following happen? <offload work> rcu_nocb_rdp_offload(): rcu_core(): ... rcu_nocb_lock_irqsave(); // no a lock raw_spin_lock_irqsave(->nocb_lock); rdp_offload_toggle(): <LOCKING | OFFLOADED set> if (!rcu_segcblist_restempty(...)) rcu_accelerate_cbs_unlocked(...); rcu_nocb_unlock_irqrestore(); // ^ a real unlock, // and will preempt_enable() // offload continue with ->nocb_lock not held If this can happen, it means an unpaired preempt_enable() and an incorrect unlock. Thoughts? Maybe I'm missing something here? Regards, Boqun > + * > + * _ Deoffloading: In the worst case we miss callbacks acceleration or > + * processing. This is fine because the early stage > + * of deoffloading invokes rcu_core() after setting > + * SEGCBLIST_RCU_CORE. So we guarantee that we'll process > + * what could have been dismissed without the need to wait > + * for the next rcu_pending() check in the next jiffy. > + */ > const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist); > > if (cpu_is_offline(smp_processor_id())) > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index 71a28f50b40f..3b470113ae38 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg) > * will refuse to put anything into the bypass. > */ > WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); > + /* > + * Start with invoking rcu_core() early. This way if the current thread > + * happens to preempt an ongoing call to rcu_core() in the middle, > + * leaving some work dismissed because rcu_core() still thinks the rdp is > + * completely offloaded, we are guaranteed a nearby future instance of > + * rcu_core() to catch up. > + */ > + rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE); > + invoke_rcu_core(); > ret = rdp_offload_toggle(rdp, false, flags); > swait_event_exclusive(rdp->nocb_state_wq, > !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | > -- > 2.25.1 >