On Wed, Sep 22, 2021 at 01:36:27AM +0200, Frederic Weisbecker wrote: > On Tue, Sep 21, 2021 at 11:12:50PM +0200, Thomas Gleixner wrote: > > Valentin reported warnings about suspicious RCU usage on RT kernels. Those > > happen when offloading of RCU callbacks is enabled: > > > > WARNING: suspicious RCU usage > > 5.13.0-rt1 #20 Not tainted > > ----------------------------- > > kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state! > > > > rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58) > > rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777) > > rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876) > > > > The reason is that rcu_rdp_is_offloaded() is invoked without one of the > > required protections on RT enabled kernels because local_bh_disable() does > > not disable preemption on RT. > > > > Valentin proposed to add a local lock to the code in question, but that's > > suboptimal in several aspects: > > > > 1) local locks add extra code to !RT kernels for no value. > > > > 2) All possible callsites have to audited and amended when affected > > possible at an outer function level due to lock nesting issues. > > > > 3) As the local lock has to be taken at the outer functions it's required > > to release and reacquire them in the inner code sections which might > > voluntary schedule, e.g. rcu_do_batch(). > > > > Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke > > rcu_rdp_is_offloaded() in the variable declaration section right at the top > > of the functions. But the actual usage of the result is either within a > > section which provides the required protections or after such a section. > > > > So the obvious solution is to move the invocation into the code sections > > which provide the proper protections, which solves the problem for RT and > > does not have any impact on !RT kernels. > > You also need to consider rcu_segcblist_completely_offloaded(). There > are two users: > > 1) The first chunk using it in rcu_core() checks if there is a need to > accelerate the callback and that can happen concurrently with nocb > manipulations on the cblist. Concurrent (de-)offloading could mess > up with the locking state but here is what we can do: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index bce848e50512..3e56a1a4af03 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2728,9 +2728,10 @@ static __latent_entropy void rcu_core(void) > > /* No grace period and unregistered callbacks? */ > if (!rcu_gp_in_progress() && > - rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { > + rcu_segcblist_is_enabled(&rdp->cblist)) { > rcu_nocb_lock_irqsave(rdp, flags); > - if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > + if (!rcu_segcblist_completely_offloaded(&rdp->cblist) && > + !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > rcu_accelerate_cbs_unlocked(rnp, rdp); > rcu_nocb_unlock_irqrestore(rdp, flags); > } > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index 305cf6aeb408..64d615be3346 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -449,10 +449,9 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp); > static void __init rcu_organize_nocb_kthreads(void); > #define rcu_nocb_lock_irqsave(rdp, flags) \ > do { \ > + local_irq_save(flags); \ > if (!rcu_segcblist_is_offloaded(&(rdp)->cblist)) \ > - local_irq_save(flags); \ > - else \ > - raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags)); \ > + raw_spin_lock(&(rdp)->nocb_lock); \ > } while (0) > #else /* #ifdef CONFIG_RCU_NOCB_CPU */ > #define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags) > > > Doing the local_irq_save() before checking that the segcblist is offloaded > protect that state from being changed (provided we lock the local rdp). Then we > can safely manipulate cblist, whether locked or unlocked. > > 2) The actual call to rcu_do_batch(). If we are preempted between > rcu_segcblist_completely_offloaded() and rcu_do_batch() with a deoffload in > the middle, we miss the callback invocation. Invoking rcu_core by the end of > deoffloading process should solve that. Maybe invoke rcu_core() at that point? My concern is that there might be an extended time between the missed rcu_do_batch() and the end of the deoffloading process. > > Reported-by: Valentin Schneider <valentin.schneider@xxxxxxx> > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > --- > > kernel/rcu/tree.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp) > > { > > unsigned long flags; > > unsigned long mask; > > - bool needwake = false; > > - const bool offloaded = rcu_rdp_is_offloaded(rdp); > > + bool offloaded, needwake = false; > > struct rcu_node *rnp; > > > > WARN_ON_ONCE(rdp->cpu != smp_processor_id()); > > rnp = rdp->mynode; > > raw_spin_lock_irqsave_rcu_node(rnp, flags); > > + offloaded = rcu_rdp_is_offloaded(rdp); > > if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq || > > rdp->gpwrap) { > > BTW Paul, if we happen to switch to non-NOCB (deoffload) some time after > rcu_report_qs_rdp(), it's possible that the rcu_accelerate_cbs() > that was supposed to be handled by nocb kthreads on behalf of > rcu_core() -> rcu_report_qs_rdp() would not happen. At least not until > we invoke rcu_core() again. Not sure how much harm that could cause. Again, can we just invoke rcu_core() as soon as this is noticed? > > @@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data > > int div; > > bool __maybe_unused empty; > > unsigned long flags; > > - const bool offloaded = rcu_rdp_is_offloaded(rdp); > > + bool offloaded; > > struct rcu_head *rhp; > > struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl); > > long bl, count = 0; > > @@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data > > rcu_nocb_lock(rdp); > > WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); > > pending = rcu_segcblist_n_cbs(&rdp->cblist); > > + offloaded = rcu_rdp_is_offloaded(rdp); > > offloaded is also checked later in rcu_do_batch(), after irqrestore. In > fact that should even become a rcu_segcblist_completely_offloaded() check > there because if there are still pending callbacks while we are de-offloading, > rcu_core() should be invoked. Hmm but that might be a remote rcu_core... > > Anyway I guess we could live with some of those races with invoking rcu core on the > target after deoffloading. > > I guess I should cook a series to handle all these issues one by one, then > probably we can live without a local_lock(). And thank you very much for looking this over! Not simple stuff. ;-) Thanx, Paul > Thanks. > > > > > div = READ_ONCE(rcu_divisor); > > div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div; > > bl = max(rdp->blimit, pending >> div);