Re: [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7 February 2017 at 14:23, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> Hi Ulf, Rafael,
>
> On Thu, Jan 12, 2017 at 6:17 PM, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
>> As the PM core may invoke the *noirq() callbacks asynchronously, the
>> current lock-less approach in genpd doesn't work. The consequence is that
>> we may find concurrent operations racing to power on/off the PM domain.
>>
>> As of now, no immediate errors has been reported, but it's probably only a
>> matter time. Therefor let's fix the problem now before this becomes a real
>> issue, by deploying the locking scheme to the relevant functions.
>>
>> Reported-by: Brian Norris <briannorris@xxxxxxxxxxxx>
>> Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
>
> This is commit ef4f7e2c8335a56b in pm/linux-next.
>
>> ---
>>  drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------
>>  1 file changed, 33 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index fd2e3e1..6b23d82 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -729,16 +729,17 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,
>
>> @@ -1070,13 +1070,17 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
>>         if (!pm_genpd_present(genpd))
>>                 return;
>>
>> +       genpd_lock(genpd);
>> +
>>         if (suspend) {
>>                 genpd->suspended_count++;
>> -               genpd_sync_power_off(genpd);
>> +               genpd_sync_power_off(genpd, 0);
>>         } else {
>> -               genpd_sync_power_on(genpd);
>> +               genpd_sync_power_on(genpd, 0);
>>                 genpd->suspended_count--;
>>         }
>> +
>> +       genpd_unlock(genpd);
>>  }
>
> This causes the following BUG on all my Renesas arm32 boards, where the
> system timer is an IRQ safe device:
>
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:232
> in_atomic(): 0, irqs_disabled(): 128, pid: 1751, name: s2ram
> CPU: 0 PID: 1751 Comm: s2ram Not tainted
> 4.10.0-rc7-koelsch-05643-g27f4c73972a614fe #3354
> Hardware name: Generic R8A7791 (Flattened Device Tree)
> [<c020e9c4>] (unwind_backtrace) from [<c020a40c>] (show_stack+0x10/0x14)
> [<c020a40c>] (show_stack) from [<c03f9818>] (dump_stack+0x7c/0x9c)
> [<c03f9818>] (dump_stack) from [<c0240020>] (___might_sleep+0x124/0x160)
> [<c0240020>] (___might_sleep) from [<c06fedfc>] (mutex_lock+0x18/0x60)
> [<c06fedfc>] (mutex_lock) from [<c04de340>] (genpd_syscore_switch+0x2c/0x7c)
> [<c04de340>] (genpd_syscore_switch) from [<c05ec328>]
> (sh_cmt_clock_event_suspend+0x18/0x28)
> [<c05ec328>] (sh_cmt_clock_event_suspend) from [<c027f9a4>]
> (clockevents_suspend+0x40/0x54)
> [<c027f9a4>] (clockevents_suspend) from [<c0276d48>]
> (timekeeping_suspend+0x23c/0x278)
> [<c0276d48>] (timekeeping_suspend) from [<c04cbb88>]
> (syscore_suspend+0x88/0x138)
> [<c04cbb88>] (syscore_suspend) from [<c025f618>]
> (suspend_devices_and_enter+0x290/0x470)
> [<c025f618>] (suspend_devices_and_enter) from [<c025fa20>]
> (pm_suspend+0x228/0x280)
> [<c025fa20>] (pm_suspend) from [<c025e60c>] (state_store+0xac/0xcc)
> [<c025e60c>] (state_store) from [<c0340c4c>] (kernfs_fop_write+0x160/0x19c)
> [<c0340c4c>] (kernfs_fop_write) from [<c02e3054>] (__vfs_write+0x20/0x108)
> [<c02e3054>] (__vfs_write) from [<c02e4424>] (vfs_write+0xb8/0x144)
> [<c02e4424>] (vfs_write) from [<c02e5014>] (SyS_write+0x40/0x80)
> [<c02e5014>] (SyS_write) from [<c0206cc0>] (ret_fast_syscall+0x0/0x34)
>
> Reverting the commit fixes the issue.

Geert, thanks for reporting! Instead of a revert, I believe a better
option is to only partly revert the change.

What I suggest is to keep the locks for the *noirq() callbacks, but
remove it for the pm_genpd_syscore* functions.

Actually the change-log I wrote for the related commit, don't mention
the changes it introduces to the syscore API - my bad! I apologize for
the inconvenience.

Allow me to send a fix on top asap. Whether Rafael can fold it into
the existing commit or add on top, I leave to him to decide.

Kind regards
Uffe



[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux