On Tue, Jul 04, 2017 at 03:01:15PM +0200, Geert Uytterhoeven wrote: > Hi Krzysztof, Rafael, > > On Wed, Jun 28, 2017 at 4:56 PM, Krzysztof Kozlowski <krzk@xxxxxxxxxx> wrote: > > genpd_syscore_switch() had two problems: > > 1. It silently assumed that device, it is being called for, belongs to > > generic power domain and used container_of() on its power domain > > pointer. Such assumption might not be true always. > > > > 2. It iterated over list of generic power domains without holding > > gpd_list_lock mutex thus list could have been modified in the same > > time. > > > > Usage of genpd_lookup_dev() solves both problems as it is safe a call > > for non-generic power domains and uses mutex when iterating. > > > > Reported-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > > Signed-off-by: Krzysztof Kozlowski <krzk@xxxxxxxxxx> > > Acked-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > > This is commit 8b55e55ee44356d6 in pm/linux-next, also part of the pull > request "[GIT PULL] Power management updates for v4.13-rc1". > > > Not tested on real hardware. > > This causes the following BUG during s2ram on all my Renesas arm32 boards, > where the system timer is an IRQ safe device: > > PM: Syncing filesystems ... done. > PM: Preparing system for sleep (mem) > Freezing user space processes ... (elapsed 0.001 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > PM: Suspending system (mem) > PM: suspend of devices complete after 122.841 msecs > PM: suspend devices took 0.130 seconds > PM: late suspend of devices complete after 2.682 msecs > PM: noirq suspend of devices complete after 29.951 msecs > Disabling non-boot CPUs ... > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 Hi, Thanks for report! Damn it, although I couldn't find this in the code, but I was fearing that this ends up in atomic section. That would kind of explain why mutex was not there [1]. Anyway, the buggy code was there already. Instead of "sleeping in atomic section" there was no locking at all... In this context this was probably safe because it was executed *after* disabling non-boot CPUs but then the function cannot be called in other contexts. I am not sure I will be capable of developing the proper fix as I do not have the hardware and I do not know all stuff happening in sh suspend. Probably reverting this and living with non-locked path would be the safest choice. [1] https://patchwork.kernel.org/patch/9778903/ Best regards, Krzysztof > in_atomic(): 0, irqs_disabled(): 128, pid: 1730, name: s2ram > CPU: 0 PID: 1730 Comm: s2ram Not tainted > 4.12.0-koelsch-07061-g810fee9afeba15ef #3592 > Hardware name: Generic R8A7791 (Flattened Device Tree) > [<c020e9f4>] (unwind_backtrace) from [<c020a484>] (show_stack+0x10/0x14) > [<c020a484>] (show_stack) from [<c04017e8>] (dump_stack+0x7c/0x9c) > [<c04017e8>] (dump_stack) from [<c0240284>] (___might_sleep+0x124/0x160) > [<c0240284>] (___might_sleep) from [<c0717cfc>] (mutex_lock+0x18/0x60) > [<c0717cfc>] (mutex_lock) from [<c04de11c>] (genpd_lookup_dev+0x38/0x94) > [<c04de11c>] (genpd_lookup_dev) from [<c04dfd34>] > (pm_genpd_syscore_poweroff+0x8/0x2c) > [<c04dfd34>] (pm_genpd_syscore_poweroff) from [<c05fcb70>] > (sh_cmt_clock_event_suspend+0x18/0x28) > [<c05fcb70>] (sh_cmt_clock_event_suspend) from [<c027f174>] > (clockevents_suspend+0x40/0x54) > [<c027f174>] (clockevents_suspend) from [<c02762d8>] > (timekeeping_suspend+0x23c/0x278) > [<c02762d8>] (timekeeping_suspend) from [<c04ce028>] > (syscore_suspend+0x88/0x138) > [<c04ce028>] (syscore_suspend) from [<c025d740>] > (suspend_devices_and_enter+0x308/0x4fc) > [<c025d740>] (suspend_devices_and_enter) from [<c025db5c>] > (pm_suspend+0x228/0x280) > [<c025db5c>] (pm_suspend) from [<c025c6b8>] (state_store+0xac/0xcc) > [<c025c6b8>] (state_store) from [<c0342f9c>] (kernfs_fop_write+0x170/0x1b0) > [<c0342f9c>] (kernfs_fop_write) from [<c02e47dc>] (__vfs_write+0x20/0x108) > [<c02e47dc>] (__vfs_write) from [<c02e5b80>] (vfs_write+0xb8/0x144) > [<c02e5b80>] (vfs_write) from [<c02e6804>] (SyS_write+0x40/0x80) > [<c02e6804>] (SyS_write) from [<c0206c40>] (ret_fast_syscall+0x0/0x34) > > Reverting the commit fixes the issue. > > > --- > > drivers/base/power/domain.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c > > index 01e31d9f6c94..d31a4434b8b3 100644 > > --- a/drivers/base/power/domain.c > > +++ b/drivers/base/power/domain.c > > @@ -1098,8 +1098,8 @@ static void genpd_syscore_switch(struct device *dev, bool suspend) > > { > > struct generic_pm_domain *genpd; > > > > - genpd = dev_to_genpd(dev); > > - if (!pm_genpd_present(genpd)) > > + genpd = genpd_lookup_dev(dev); > > + if (!genpd) > > return; > > > > if (suspend) { > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds