On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote: > On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote: > > With CONFIG_PROVE_RCU_LIST=y and by executing > > > > $ echo 0 > /sys/devices/system/cpu/cpu1/online > > > > one can trigger the following Lockdep-RCU splat on ARM: > > > > ============================= > > WARNING: suspicious RCU usage > > 6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted > > ----------------------------- > > kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 2, debug_locks = 1 > > no locks held by swapper/1/0. > > > > stack backtrace: > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10 > > Hardware name: Allwinner sun8i Family > > unwind_backtrace from show_stack+0x10/0x14 > > show_stack from dump_stack_lvl+0x60/0x90 > > dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0 > > lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8 > > __lock_acquire from lock_acquire+0x10c/0x348 > > lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c > > _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8 > > check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c > > arch_cpu_idle_dead from do_idle+0xbc/0x138 > > do_idle from cpu_startup_entry+0x28/0x2c > > cpu_startup_entry from secondary_start_kernel+0x11c/0x124 > > secondary_start_kernel from 0x401018a0 > > > > The CPU is already reported as offline from RCU perspective in > > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above > > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the > > ASID spinlock. > > > > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as > > online again while the spinlock is held. > > > > Signed-off-by: Stefan Wiehler <stefan.wiehler@xxxxxxxxx> > > From an RCU perspective, this looks plausible. One question > below. But one additional caution... If execution is delayed during that call to idle_task_exit(), RCU will stall and won't have a reasonable way of motivating this CPU. Such delays could be due to vCPU preemption or due to firmware grabbing the CPU. But this is only a caution, not opposition. After all, you could have the same problem with an online CPU that gets similarly delayed while its interrupts are disabled. Thanx, Paul > > --- > > arch/arm/kernel/smp.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > > index 3431c0553f45..6875e2c5dd50 100644 > > --- a/arch/arm/kernel/smp.c > > +++ b/arch/arm/kernel/smp.c > > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void) > > { > > unsigned int cpu = smp_processor_id(); > > > > + /* > > + * Briefly report CPU as online again to avoid false positive > > + * Lockdep-RCU splat when check_and_switch_context() acquires ASID > > + * spinlock. > > + */ > > + rcutree_report_cpu_starting(cpu); > > idle_task_exit(); > > + rcutree_report_cpu_dead(); > > > > local_irq_disable(); > > Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain > bitterly via lockdep if interrupts are enabled. And the call sites have > interrupts disabled. So I don't understand what this local_irq_disable() > is needed for.