On Wed, 15 Jun 2022 at 13:36, Adrien Thierry <athierry@xxxxxxxxxx> wrote: > > We've been encountering a BUG: scheduling while atomic while running the > 5.18.0-rt11 kernel on a Qualcomm SoC (see stacktrace below). > > It seems to occur because a spinlock is taken in the PSCI idle code path > in the idle loop. With the RT patchset applied and CONFIG_PREEMPT_RT > enabled, spinlocks can sleep, thus triggering the bug. > > In order to prevent this, replace the generic_pm_domain spinlock by a > raw spinlock. > > [ 2.994433] BUG: scheduling while atomic: swapper/6/0/0x00000002 > [ 2.994439] Modules linked in: > [ 2.994447] [<ffff80000810b0ec>] migrate_enable+0x3c/0x160 > [ 2.994461] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.18.0-rt11+ #1 > [ 2.994464] Hardware name: Qualcomm SA8295P ADP (DT) > [ 2.994466] Call trace: > [ 2.994467] dump_backtrace+0xb0/0x120 > [ 2.994473] show_stack+0x1c/0x6c > [ 2.994477] dump_stack_lvl+0x64/0x7c > [ 2.994483] dump_stack+0x14/0x2c > [ 2.994487] __schedule_bug+0xa8/0xc0 > [ 2.994489] schedule_debug.constprop.0+0x154/0x170 > [ 2.994492] __schedule+0x58/0x520 > [ 2.994496] schedule_rtlock+0x20/0x44 > [ 2.994499] rtlock_slowlock_locked+0x110/0x260 > [ 2.994503] rt_spin_lock+0x74/0x94 > [ 2.994505] genpd_lock_nested_spin+0x20/0x30 > [ 2.994509] genpd_power_off.part.0.isra.0+0x248/0x2cc > [ 2.994512] genpd_runtime_suspend+0x1a0/0x300 > [ 2.994515] __rpm_callback+0x4c/0x16c > [ 2.994518] rpm_callback+0x6c/0x80 > [ 2.994520] rpm_suspend+0x10c/0x63c > [ 2.994523] __pm_runtime_suspend+0x54/0xa4 > [ 2.994526] __psci_enter_domain_idle_state.constprop.0+0x64/0x10c > [ 2.994532] psci_enter_domain_idle_state+0x1c/0x24 > [ 2.994534] cpuidle_enter_state+0x8c/0x3f0 > [ 2.994539] cpuidle_enter+0x3c/0x50 > [ 2.994543] cpuidle_idle_call+0x158/0x1d4 > [ 2.994548] do_idle+0xa8/0xfc > [ 2.994551] cpu_startup_entry+0x28/0x30 > [ 2.994556] secondary_start_kernel+0xe4/0x140 > [ 2.994563] __secondary_switched+0x54/0x58 > > Signed-off-by: Adrien Thierry <athierry@xxxxxxxxxx> > --- > > This patch fixes the bug but I'm not sure if this is the proper way to do > so. Suggestions for other ways to fix this are very welcome. Honestly, I am not so sure about this either. Turning the lock into spinlock_t into a raw_spinlock_t, may have the effect of spreading into constraints on the genpd providers. Thus those may need to be converted to use raw_spinlock_t too (assuming they use a spinlock_t and GENPD_FLAG_IRQ_SAFE). On the other hand, there are a limited number of genpd providers that this can become a problem for, if any, so maybe it would not be a big problem after all. Kind regards Uffe > > drivers/base/power/domain.c | 10 +++++----- > include/linux/pm_domain.h | 2 +- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c > index 739e52cd4aba..9378decb58cf 100644 > --- a/drivers/base/power/domain.c > +++ b/drivers/base/power/domain.c > @@ -82,7 +82,7 @@ static void genpd_lock_spin(struct generic_pm_domain *genpd) > { > unsigned long flags; > > - spin_lock_irqsave(&genpd->slock, flags); > + raw_spin_lock_irqsave(&genpd->slock, flags); > genpd->lock_flags = flags; > } > > @@ -92,7 +92,7 @@ static void genpd_lock_nested_spin(struct generic_pm_domain *genpd, > { > unsigned long flags; > > - spin_lock_irqsave_nested(&genpd->slock, flags, depth); > + raw_spin_lock_irqsave_nested(&genpd->slock, flags, depth); > genpd->lock_flags = flags; > } > > @@ -101,7 +101,7 @@ static int genpd_lock_interruptible_spin(struct generic_pm_domain *genpd) > { > unsigned long flags; > > - spin_lock_irqsave(&genpd->slock, flags); > + raw_spin_lock_irqsave(&genpd->slock, flags); > genpd->lock_flags = flags; > return 0; > } > @@ -109,7 +109,7 @@ static int genpd_lock_interruptible_spin(struct generic_pm_domain *genpd) > static void genpd_unlock_spin(struct generic_pm_domain *genpd) > __releases(&genpd->slock) > { > - spin_unlock_irqrestore(&genpd->slock, genpd->lock_flags); > + raw_spin_unlock_irqrestore(&genpd->slock, genpd->lock_flags); > } > > static const struct genpd_lock_ops genpd_spin_ops = { > @@ -2022,7 +2022,7 @@ static void genpd_free_data(struct generic_pm_domain *genpd) > static void genpd_lock_init(struct generic_pm_domain *genpd) > { > if (genpd->flags & GENPD_FLAG_IRQ_SAFE) { > - spin_lock_init(&genpd->slock); > + raw_spin_lock_init(&genpd->slock); > genpd->lock_ops = &genpd_spin_ops; > } else { > mutex_init(&genpd->mlock); > diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h > index ebc351698090..80166a915b0d 100644 > --- a/include/linux/pm_domain.h > +++ b/include/linux/pm_domain.h > @@ -159,7 +159,7 @@ struct generic_pm_domain { > union { > struct mutex mlock; > struct { > - spinlock_t slock; > + raw_spinlock_t slock; > unsigned long lock_flags; > }; > }; > > base-commit: 979086f5e0066b4eff66e1eee123da228489985c > -- > 2.35.3 >