Re: Regression in 5.16-rc1 with suspend to idle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 18, 2022 at 5:15 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Thu, Feb 17, 2022 at 9:16 PM Limonciello, Mario
> <Mario.Limonciello@xxxxxxx> wrote:
> >
> > [Public]
> >
> > > > I've found recently that on kernel 5.17-rc4 some OEM AMD laptops are
> > > shutting down when entering suspend to idle.
> > >
> > > Interesting.  Can you identify the exact point when the shutdown occurs?
> >
> > It looks like it's a platform firmware crash that causes the shutdown not a kernel
> > crash.
>
> It looks like we'll need to quirk the system in question then.
>
> > > > What would you suggest to be done in this case?  Revert both commits?  Or
> > > would you prefer to have a fixup on top
> > > > of that?
> > >
> > > I would prefer to fix the problem on top of the current 5.16-rc.
> > >
> >
> > Here is the minimal patch I can come up with on top of 5.17-rc4 that fixes it:
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 04ea92cbd9cf..f5bf46782043 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/suspend.h>
> >  #include <trace/events/power.h>
> >  #include <linux/cpufreq.h>
> > +#include <linux/cpuidle.h>
> >  #include <linux/devfreq.h>
> >  #include <linux/timer.h>
> >
> > @@ -1350,6 +1351,8 @@ int dpm_suspend_noirq(pm_message_t state)
> >  {
> >         int ret;
> >
> > +       cpuidle_pause();
>
> Can you replace this with wake_up_all_idle_cpus() and remove the
> cpuidle_resume()/cpuidle_pause() calls from s2idle_enter() and see
> what happens?
>
> > +
> >         device_wakeup_arm_wake_irqs();
> >         suspend_device_irqs();
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 6fcdee7e87a5..1708a643ba5d 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -97,6 +97,7 @@ static void s2idle_enter(void)
> >         raw_spin_unlock_irq(&s2idle_lock);
> >
> >         cpus_read_lock();
> > +       cpuidle_resume();
> >
> >         /* Push all the CPUs into the idle loop. */
> >         wake_up_all_idle_cpus();
> > @@ -104,6 +105,7 @@ static void s2idle_enter(void)
> >         swait_event_exclusive(s2idle_wait_head,
> >                     s2idle_state == S2IDLE_STATE_WAKE);
> >
> > +       cpuidle_pause();
> >         cpus_read_unlock();
> >
> >         raw_spin_lock_irq(&s2idle_lock);
> >
> >
> > * Removing the cpuidle_pause call from s2idle_enter will fix the crash,
> >   but the system doesn't enter the deepest sleep state.
>
> I think you mean that removing it doesn't make a difference except
> that it prevents the deepest state from being entered.
>
> Pausing cpuidle here is kind of redundant, because it just flips the
> "initialized" flag that's going to be flipped again by
> cpuidle_resume() in the next iteration.  [BTW, your patch is missing
> one an additional cpuidle_resume() in the resume path to match this
> cpuidle_pause().]  Also calling  wake_up_all_idle_cpus() is not likely
> to make a difference, so I'm guessing that synchronize_rcu() does the
> trick.
>
> > * removing the cpuidle_pause call from dpm_suspend_noirq the firmware continues to
> >    crash.
>
> So the crash occurs while running dpm_suspend_noirq().
>
> > I also confirmed that reverting both of these commits together on top of 5.17-rc4 fixes it:
> >
> > 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
> > 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
> >
> > The commit messages at least make it sound like it was just a rework for unification of the codepaths,
> > not supposed to be for anything to be actually fixed, so I would think it should be safe to revert.
>
> There is a deeper issue related to these commits and I'm not inclined
> to go back to the old code if there is only one system affected by
> this and the problem is related to the platform firmware on that
> system.
>
> > So please advise which way you want to go and I'll send up a patch (or if you want to write one
> > I'm happy to take it and test it since I can readily reproduce it).
>
> To start with, please test the attached debug patch on top of -rc4.

Attached is another patch to try, testing the hypothesis that the
observed crash is related to CPUs being in idle state that are too
deep for some reason during late suspend and early resume.
---
 kernel/power/suspend.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-pm/kernel/power/suspend.c
===================================================================
--- linux-pm.orig/kernel/power/suspend.c
+++ linux-pm/kernel/power/suspend.c
@@ -23,6 +23,7 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/pm_qos.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
 #include <linux/swait.h>
@@ -367,6 +368,20 @@ static int suspend_prepare(suspend_state
 	return error;
 }
 
+static struct pm_qos_request pm_suspend_cpu_latency_qos_req;
+
+static void pm_suspend_cpu_latency_qos_set(void)
+{
+	cpu_latency_qos_add_request(&pm_suspend_cpu_latency_qos_req, 3);
+	wake_up_all_idle_cpus();
+}
+
+static void pm_suspend_cpu_latency_qos_clear(void)
+{
+	cpu_latency_qos_remove_request(&pm_suspend_cpu_latency_qos_req);
+	wake_up_all_idle_cpus();
+}
+
 /* default implementation */
 void __weak arch_suspend_disable_irqs(void)
 {
@@ -403,6 +418,8 @@ static int suspend_enter(suspend_state_t
 	if (error)
 		goto Devices_early_resume;
 
+	pm_suspend_cpu_latency_qos_set();
+
 	error = dpm_suspend_noirq(PMSG_SUSPEND);
 	if (error) {
 		pr_err("noirq suspend of devices failed\n");
@@ -457,6 +474,7 @@ static int suspend_enter(suspend_state_t
 	dpm_resume_noirq(PMSG_RESUME);
 
  Platform_early_resume:
+	pm_suspend_cpu_latency_qos_clear();
 	platform_resume_early(state);
 
  Devices_early_resume:

[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux