On Thu, Sep 30, 2021 at 10:00:39AM +0100, Valentin Schneider wrote: > Hi, > > On 21/09/21 23:12, Thomas Gleixner wrote: > > Valentin reported warnings about suspicious RCU usage on RT kernels. Those > > happen when offloading of RCU callbacks is enabled: > > > > WARNING: suspicious RCU usage > > 5.13.0-rt1 #20 Not tainted > > ----------------------------- > > kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state! > > > > rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58) > > rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777) > > rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876) > > > > The reason is that rcu_rdp_is_offloaded() is invoked without one of the > > required protections on RT enabled kernels because local_bh_disable() does > > not disable preemption on RT. > > > > Valentin proposed to add a local lock to the code in question, but that's > > suboptimal in several aspects: > > > > 1) local locks add extra code to !RT kernels for no value. > > > > 2) All possible callsites have to audited and amended when affected > > possible at an outer function level due to lock nesting issues. > > > > 3) As the local lock has to be taken at the outer functions it's required > > to release and reacquire them in the inner code sections which might > > voluntary schedule, e.g. rcu_do_batch(). > > > > Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke > > rcu_rdp_is_offloaded() in the variable declaration section right at the top > > of the functions. But the actual usage of the result is either within a > > section which provides the required protections or after such a section. > > > > So the obvious solution is to move the invocation into the code sections > > which provide the proper protections, which solves the problem for RT and > > does not have any impact on !RT kernels. > > > > Thanks for taking a look at this! > > My reasoning for adding protection in the outer functions was to prevent > impaired unlocks of rcu_nocb_{un}lock_irqsave(), as that too depends on the > offload state. Cf. Frederic's writeup: > > http://lore.kernel.org/r/20210727230814.GC283787@lothringen I was wrong about that BTW! Because rcu_nocb_lock() always require IRQs to be disabled, which of course disables preemption, so the offloaded state can't change between rcu_nocb_lock[_irqsave]() and rcu_nocb_unlock[_irqrestore]() but anyway there were many other issues to fix :-) > > Anywho, I see Frederic has sent a fancy new series, let me go stare at it.