[Public] > it: > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > index 04ea92cbd9cf..f5bf46782043 100644 > > > --- a/drivers/base/power/main.c > > > +++ b/drivers/base/power/main.c > > > @@ -32,6 +32,7 @@ > > > #include <linux/suspend.h> > > > #include <trace/events/power.h> > > > #include <linux/cpufreq.h> > > > +#include <linux/cpuidle.h> > > > #include <linux/devfreq.h> > > > #include <linux/timer.h> > > > > > > @@ -1350,6 +1351,8 @@ int dpm_suspend_noirq(pm_message_t state) > > > { > > > int ret; > > > > > > + cpuidle_pause(); > > > > Can you replace this with wake_up_all_idle_cpus() and remove the > > cpuidle_resume()/cpuidle_pause() calls from s2idle_enter() and see > > what happens? > > > > > + > > > device_wakeup_arm_wake_irqs(); > > > suspend_device_irqs(); > > > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > > index 6fcdee7e87a5..1708a643ba5d 100644 > > > --- a/kernel/power/suspend.c > > > +++ b/kernel/power/suspend.c > > > @@ -97,6 +97,7 @@ static void s2idle_enter(void) > > > raw_spin_unlock_irq(&s2idle_lock); > > > > > > cpus_read_lock(); > > > + cpuidle_resume(); > > > > > > /* Push all the CPUs into the idle loop. */ > > > wake_up_all_idle_cpus(); > > > @@ -104,6 +105,7 @@ static void s2idle_enter(void) > > > swait_event_exclusive(s2idle_wait_head, > > > s2idle_state == S2IDLE_STATE_WAKE); > > > > > > + cpuidle_pause(); > > > cpus_read_unlock(); > > > > > > raw_spin_lock_irq(&s2idle_lock); > > > > > > > > > * Removing the cpuidle_pause call from s2idle_enter will fix the crash, > > > but the system doesn't enter the deepest sleep state. > > > > I think you mean that removing it doesn't make a difference except > > that it prevents the deepest state from being entered. > > > > Pausing cpuidle here is kind of redundant, because it just flips the > > "initialized" flag that's going to be flipped again by > > cpuidle_resume() in the next iteration. [BTW, your patch is missing > > one an additional cpuidle_resume() in the resume path to match this > > cpuidle_pause().] Also calling wake_up_all_idle_cpus() is not likely > > to make a difference, so I'm guessing that synchronize_rcu() does the > > trick. > > > > > * removing the cpuidle_pause call from dpm_suspend_noirq the > firmware continues to > > > crash. > > > > So the crash occurs while running dpm_suspend_noirq(). > > > > > I also confirmed that reverting both of these commits together on top of > 5.17-rc4 fixes it: > > > > > > 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to- > idle path") > > > 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier > during system transitions") > > > > > > The commit messages at least make it sound like it was just a rework for > unification of the codepaths, > > > not supposed to be for anything to be actually fixed, so I would think it > should be safe to revert. > > > > There is a deeper issue related to these commits and I'm not inclined > > to go back to the old code if there is only one system affected by > > this and the problem is related to the platform firmware on that > > system. > > > > > So please advise which way you want to go and I'll send up a patch (or if > you want to write one > > > I'm happy to take it and test it since I can readily reproduce it). > > > > To start with, please test the attached debug patch on top of -rc4. > > Attached is another patch to try, testing the hypothesis that the > observed crash is related to CPUs being in idle state that are too > deep for some reason during late suspend and early resume. I tried 3 test kernels: * 5.17-rc4 + Your second debugging patch * 5.17-rc4+ Your first debugging patch * 5.17-rc4 + A hack I wrote that pushed amd-pmc into "later" in the suspend using a global symbol called after LPS0 instead of letting it run in noirq stage It works properly on all of those, tried about 5x time in each. Then I confirmed I could still crash it on 5.17-rc4 with my control kernel.