On 02/02/2018 06:49 PM, Rafael J. Wysocki wrote: > On Fri, Feb 2, 2018 at 1:53 PM, Prateek Sood <prsood@xxxxxxxxxxxxxx> wrote: >> On 02/02/2018 05:18 PM, Rafael J. Wysocki wrote: >>> On Friday, February 2, 2018 12:41:58 PM CET Prateek Sood wrote: >>>> Hi Viresh, >>>> >>>> One scenario is there where a kernel panic is observed in >>>> cpufreq during suspend/resume. >>>> >>>> pm_suspend() >>>> suspend_devices_and_enter() >>>> dpm_suspend_start() >>>> dpm_prepare() >>>> >>>> Failure in dpm_prepare() happend with following dmesg: >>>> >>>> [ 3746.316062] PM: Device xyz not prepared for power transition: code -16 >>>> [ 3746.316071] PM: Some devices failed to suspend, or early wake event detected >>>> >>>> >>>> pm_suspend() >>>> suspend_devices_and_enter() >>>> dpm_suspend_start() >>>> dpm_prepare() //failed >>>> dpm_resume_end() >>>> dpm_resume() >>>> cpufreq_resume() >>>> cpufreq_start_governor() >>>> sugov_start() >>>> cpufreq_add_update_util_hook() >>>> >>>> After failure in dpm_prepare(), dpm_resume() called >>>> cpufreq_resume(). Corresponding cpufreq_suspend() was not >>>> called due to failure of dpm_prepare(). >>>> >>>> This resulted in WARN_ON(per_cpu(cpufreq_update_util_data, cpu)) >>>> in cpufreq_add_update_util_hook() and cpufreq_add_update_util_hook->func >>>> being inconsistent state. It caused crash in scheduler. >>>> >>>> Following are some of the ways to mitigate this issue. Could >>>> you please provide feedback on below two approaches or suugest >>>> a better way to fix this problem. >>>> >>>> -----------------------8<------------------------------ >>>> >>>> Co-developed-by: Gaurav Kohli <gkohli@xxxxxxxxxxxxxx> >>>> Signed-off-by: Gaurav Kohli <gkohli@xxxxxxxxxxxxxx> >>>> Signed-off-by: Prateek Sood <prsood@xxxxxxxxxxxxxx> >>>> >>>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c >>>> index 02a497e..732e5a2 100644 >>>> --- a/drivers/base/power/main.c >>>> +++ b/drivers/base/power/main.c >>>> @@ -1038,6 +1038,7 @@ void dpm_resume(pm_message_t state) >>>> { >>>> struct device *dev; >>>> ktime_t starttime = ktime_get(); >>>> + bool valid_resume = false; >>>> >>>> trace_suspend_resume(TPS("dpm_resume"), state.event, true); >>>> might_sleep(); >>>> @@ -1055,6 +1056,7 @@ void dpm_resume(pm_message_t state) >>>> } >>>> >>>> while (!list_empty(&dpm_suspended_list)) { >>>> + valid_resume = true; >>>> dev = to_device(dpm_suspended_list.next); >>>> get_device(dev); >>>> if (!is_async(dev)) { >>>> @@ -1080,7 +1082,8 @@ void dpm_resume(pm_message_t state) >>>> async_synchronize_full(); >>>> dpm_show_time(starttime, state, 0, NULL); >>>> >>>> - cpufreq_resume(); >>>> + if (valid_resume) >>>> + cpufreq_resume(); >>>> trace_suspend_resume(TPS("dpm_resume"), state.event, false); >>>> } >>>> >>>> --------------------8<-------------------------------------- >>>> >>>> Signed-off-by: Prateek Sood <prsood@xxxxxxxxxxxxxx> >>>> >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>> index 421f318..439eab8 100644 >>>> --- a/drivers/cpufreq/cpufreq.c >>>> +++ b/drivers/cpufreq/cpufreq.c >>>> @@ -1648,7 +1648,7 @@ void cpufreq_suspend(void) >>>> { >>>> struct cpufreq_policy *policy; >>>> >>>> - if (!cpufreq_driver) >>>> + if (!cpufreq_driver || cpufreq_suspended) >>>> return; >>>> >>>> if (!has_target() && !cpufreq_driver->suspend) >>>> @@ -1683,7 +1683,7 @@ void cpufreq_resume(void) >>>> struct cpufreq_policy *policy; >>>> int ret; >>>> >>>> - if (!cpufreq_driver) >>>> + if (!cpufreq_driver || !cpufreq_suspended) >>>> return; >>>> >>>> cpufreq_suspended = false; >>> >>> Since we have cpufreq_suspended already, the second one is better. >>> >> >> Thanks Rafael for the inputs, I will send a formal patch. > > Bo Yan has posted something really similar already, however: > > https://patchwork.kernel.org/patch/10181101/ > > so I would prefer to apply a new version of that one with the latest > comment taken into account: > > https://patchwork.kernel.org/patch/10183075/ > > for the credit to go to the first submitter. > Thanks for the information Rafael. I believe safety check in both cpufreq_suspend() and cpufreq_resume() would be a good thing to have. -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html