On Wed, 15 Jun 2022 at 14:44, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > On Wed, 15 Jun 2022 at 13:36, Adrien Thierry <athierry@xxxxxxxxxx> wrote: > > > > We've been encountering a BUG: scheduling while atomic while running the > > 5.18.0-rt11 kernel on a Qualcomm SoC (see stacktrace below). > > > > It seems to occur because a spinlock is taken in the PSCI idle code path > > in the idle loop. With the RT patchset applied and CONFIG_PREEMPT_RT > > enabled, spinlocks can sleep, thus triggering the bug. > > > > In order to prevent this, replace the generic_pm_domain spinlock by a > > raw spinlock. > > > > [ 2.994433] BUG: scheduling while atomic: swapper/6/0/0x00000002 > > [ 2.994439] Modules linked in: > > [ 2.994447] [<ffff80000810b0ec>] migrate_enable+0x3c/0x160 > > [ 2.994461] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.18.0-rt11+ #1 > > [ 2.994464] Hardware name: Qualcomm SA8295P ADP (DT) > > [ 2.994466] Call trace: > > [ 2.994467] dump_backtrace+0xb0/0x120 > > [ 2.994473] show_stack+0x1c/0x6c > > [ 2.994477] dump_stack_lvl+0x64/0x7c > > [ 2.994483] dump_stack+0x14/0x2c > > [ 2.994487] __schedule_bug+0xa8/0xc0 > > [ 2.994489] schedule_debug.constprop.0+0x154/0x170 > > [ 2.994492] __schedule+0x58/0x520 > > [ 2.994496] schedule_rtlock+0x20/0x44 > > [ 2.994499] rtlock_slowlock_locked+0x110/0x260 > > [ 2.994503] rt_spin_lock+0x74/0x94 > > [ 2.994505] genpd_lock_nested_spin+0x20/0x30 > > [ 2.994509] genpd_power_off.part.0.isra.0+0x248/0x2cc > > [ 2.994512] genpd_runtime_suspend+0x1a0/0x300 > > [ 2.994515] __rpm_callback+0x4c/0x16c > > [ 2.994518] rpm_callback+0x6c/0x80 > > [ 2.994520] rpm_suspend+0x10c/0x63c > > [ 2.994523] __pm_runtime_suspend+0x54/0xa4 > > [ 2.994526] __psci_enter_domain_idle_state.constprop.0+0x64/0x10c > > [ 2.994532] psci_enter_domain_idle_state+0x1c/0x24 > > [ 2.994534] cpuidle_enter_state+0x8c/0x3f0 > > [ 2.994539] cpuidle_enter+0x3c/0x50 > > [ 2.994543] cpuidle_idle_call+0x158/0x1d4 > > [ 2.994548] do_idle+0xa8/0xfc > > [ 2.994551] cpu_startup_entry+0x28/0x30 > > [ 2.994556] secondary_start_kernel+0xe4/0x140 > > [ 2.994563] __secondary_switched+0x54/0x58 > > > > Signed-off-by: Adrien Thierry <athierry@xxxxxxxxxx> > > --- > > > > This patch fixes the bug but I'm not sure if this is the proper way to do > > so. Suggestions for other ways to fix this are very welcome. > > Honestly, I am not so sure about this either. > > Turning the lock into spinlock_t into a raw_spinlock_t, may have the > effect of spreading into constraints on the genpd providers. Thus > those may need to be converted to use raw_spinlock_t too (assuming > they use a spinlock_t and GENPD_FLAG_IRQ_SAFE). On the other hand, > there are a limited number of genpd providers that this can become a > problem for, if any, so maybe it would not be a big problem after all. Ohh, one more thing. For cpuidle-psci in the idle path, we call pm_runtime_get|put_sync(). Those calls end up using another spinlock_t (dev->power.lock). That seems like a similar problem, but may be harder to solve. [...] Kind regards Uffe