On 11/22/20 9:19 PM, Valentin Schneider wrote: > Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno > leads to the idle task blocking on an RT sleeping spinlock down some > notifier path: > > [ 1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002 > [ 1.809116] Modules linked in: > [ 1.809123] Preemption disabled at: > [ 1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227) > [ 1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.10.0-rc3-rt7 #168 > [ 1.809153] Hardware name: ARM Juno development board (r0) (DT) > [ 1.809158] Call trace: > [ 1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1)) > [ 1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198) > [ 1.809178] dump_stack (lib/dump_stack.c:122) > [ 1.809188] __schedule_bug (kernel/sched/core.c:4886) > [ 1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040) > [ 1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1)) > [ 1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072) > [ 1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110) > [ 1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139) > [ 1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186) > [ 1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93) > [ 1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129) > [ 1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238) > [ 1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353) > [ 1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273) > [ 1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1)) > [ 1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273) > > Two points worth noting: > > 1) That this is conceptually the same issue as pointed out in: > 313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier") > 2) Only the _robust() variant of atomic_notifier callchains suffer from > this > > AFAICT only the cpu_pm_notifier_chain really needs to be changed, but > singling it out would mean introducing a new (truly) non-blocking API. At > the same time, callers that are fine with any blocking within the call > chain should use blocking notifiers, so patching up all atomic_notifier's > doesn't seem *too* crazy to me. > > Fixes: 70d932985757 ("notifier: Fix broken error handling pattern") > Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx> Reviewed-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> Thanks! -- Daniel