Re: Occasional (too common) suspend problem

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 21 Jan 2011 08:28:41 -0800

On Thu, Jan 20, 2011 at 9:26 PM, Lin Ming <ming.m.lin@xxxxxxxxx> wrote:
> On Fri, 2011-01-21 at 12:50, Linus Torvalds wrote:
>>
>>  ...
>>  [   54.628375] PM: Saving platform NVS memory
>>  [   54.628387] Disabling non-boot CPUs ...
>>  [   63.554966] ACPI Exception: AE_BAD_PARAMETER, Returned by Handler
>> for [EmbeddedControl] (20110112/evregion-474)
>>  [   63.554992] ACPI Error: Method parse/execution failed
>> [\_SB_.PCI0.SBRG.EC0_.RCTP] (Node f5c2dea0), AE_BAD_PARAMETER
>> (20110112/psparse-536)
>>  [   63.555022] ACPI Error: Method parse/execution failed
>> [\_TZ_.RTMP] (Node f5c32fa8), AE_BAD_PARAMETER (20110112/psparse-536)
>>  [   63.555047] ACPI Error: Method parse/execution failed
>> [\_TZ_.TZ00._TMP] (Node f5c34018), AE_BAD_PARAMETER
>> (20110112/psparse-536)
>>  [   63.555079] Thermal: failed to read out thermal zone 0
>>  [   63.556361] CPU 1 is now offline
>>  [   63.556944] PM: Restoring platform NVS memory
>>  [   63.556944] Enabling non-boot CPUs ...
>>  [   63.556944] Booting Node 0 Processor 1 APIC 0x1
>>  [   63.556279] Initializing CPU#1
>>  ...
>>
>> which really doesn't tell me much, except that clearly something in
>> ACPI-land is unhappy, and it looks thermal-related (that last error
>> message comes from thermal_zone_device_update()).
>>
>> Any ideas?
>
> Does revert bba63a29(ACPICA: Implicit notify support) help?

It seemed to, but on the third boot (with several suspend cycles per
boot) I ended up seeing it. This is why I can't bisect it - it really
isn't reliable enough to bisect sanely.

One thing I've noticed: if the system suspends once, it seems to
suspend several times. At least I think that every time I've seen this
problem, it's happened on the first suspend (and if it comes back
after a keypress, the second suspend will hang hard). But I've been
booting this machine so much during all the testing, that I haven't
ever done a really _long_ run of many suspend/resume cyles, so my
evidence for that is weakish.

It also seems to be easier to trigger under some moderate load.  But
that may be a total red herring, and maybe it's my expectations that
color that impression (ie the fact that I saw the "thermal" thing, and
started thinking that it happens more easily if I run some system
stresser on it).

Finally, it may well be that the thermal problem from ACPI is harmless
- I've seen the suspend end up hanging on a keypress (but coming back)
even without that particular message being in the logs.

Zhang Rui wrote:
>
> is this a 2.6.38-rc1 regression?

I'm fairly certain it is. I don't use that machine much (it's slow as
molasses and the screen is not very good), and I did a full re-install
of Fedora 14 on it just this Monday to take ti with me to LCA. But I
have bisected three different suspend problems - not including this
one - on this machine since then, so I have been booting lots of
kernels on it, and suspending/resuming it a lot.

And I have never seen a suspend/resume failure (so far - knock wood)
when running plain 2.6.37.

That said, it is NOT reliable. so I can't guarantee anything. Many of
the suspends I've been doing have been by ssh'ing in, and just doing
"echo mem > /sys/power/state". And I get the feeling that this problem
happens more when I suspend by closing the lid.

(Again, that "closing the lid" may be about me being physically at the
machine and starting programs too, so my other theory that it is
load-dependent might account for that).

But it might well be some race, and have nothing to do with load or
lid or anything, and just be entirely bad luck. The patterns I've seen
are not strong enough to really make any judgement.

> can you attach the acpidump output of this machine?

Attached. I also attach the full dmesg of a boot and then later a
successful suspend/resume.

                            Linus
Attachment:
acpi.dump

Description: Binary data
Attachment:
dmesg

Description: Binary data
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm