Re: [PATCH 0/7] PM: Solution for S0ix failure caused by PCH overheating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 5, 2022 at 3:58 AM Zhang Rui <rui.zhang@xxxxxxxxx> wrote:
>
> On some Intel client platforms like SKL/KBL/CNL/CML, there is a
> PCH thermal sensor that monitors the PCH temperature and blocks the system
> from entering S0ix in case it overheats.
>
> Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH
> temperature above threshold") introduces a delay loop to cool the
> temperature down for this purpose.
>
> However, in practice, we found that the time it takes to cool the PCH down
> below threshold highly depends on the initial PCH temperature when the
> delay starts, as well as the ambient temperature.
>
> For example, on a Dell XPS 9360 laptop, the problem can be triggered
> 1. when it is suspended with heavy workload running.
> or
> 2. when it is moved from New Hampshire to Florida.
>
> In these cases, the 1 second delay is not sufficient. As a result, the
> system stays in a shallower power state like PCx instead of S0ix, and
> drains the battery power, without user' notice.
>
> In this patch series, we first fix the problem in patch 1/7 ~ 3/7, by
> 1. expand the default overall cooling delay timeout to 60 seconds.
> 2. make sure the temperature is below threshold rather than equal to it.
> 3. move the delay to .suspend_noirq phase instead, in order to
>    a) do the cooling when the system is in a more quiescent state
>    b) be aware of wakeup events during the long delay, because some wakeup
>       events (ACPI Power button Press, USB mouse, etc) become valid only
>       in .suspend_noirq phase and later.
>
> However, this potential long delay introduces a problem to our suspend
> stress automation test, because the delay makes it hard to predict how
> much time it takes to suspend the system.
> As we want to do as much suspend iterations as possible in limited time,
> setting a 60+ seconds rtc alarm for suspend which usually takes shorter
> than 1 second is far beyond overkill.
>
> Thus, in patch 4/7 ~ 7/7, a rtc driver hook is introduced, which cancels
> the armed rtc alarm in the beginning of suspend and then rearm the rtc
> alarm with a short interval (say, 2 second) right before system suspended.
>
> By running
>  # echo 2 > /sys/module/rtc_cmos/parameters/rtc_wake_override_sec
> before suspend, the system can be resumed by RTC alarm right after it is
> suspended, no matter how much time the suspend really takes.
>
> This patch series has been tested on the same Dell XPS 9360 laptop and
> S0ix is 100% achieved across 1000+ s2idle iterations.

Overall, the first three patches in the series can go in without the
rest, so let's put them into a separate series.

Patch [4/7] doesn't depend on the first three ones, so it can go in by itself.

Patch [5/7] is to be dropped anyway as per the earlier discussion.

Patch [6/7] is only needed to apply patch [7/7] which is controversial.

I think that we can drop or defer patches [6-7/7] for now.



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux