On Thu, Jan 20, 2011 at 9:26 PM, Lin Ming <ming.m.lin@xxxxxxxxx> wrote: > On Fri, 2011-01-21 at 12:50, Linus Torvalds wrote: >> >> ... >> [ 54.628375] PM: Saving platform NVS memory >> [ 54.628387] Disabling non-boot CPUs ... >> [ 63.554966] ACPI Exception: AE_BAD_PARAMETER, Returned by Handler >> for [EmbeddedControl] (20110112/evregion-474) >> [ 63.554992] ACPI Error: Method parse/execution failed >> [\_SB_.PCI0.SBRG.EC0_.RCTP] (Node f5c2dea0), AE_BAD_PARAMETER >> (20110112/psparse-536) >> [ 63.555022] ACPI Error: Method parse/execution failed >> [\_TZ_.RTMP] (Node f5c32fa8), AE_BAD_PARAMETER (20110112/psparse-536) >> [ 63.555047] ACPI Error: Method parse/execution failed >> [\_TZ_.TZ00._TMP] (Node f5c34018), AE_BAD_PARAMETER >> (20110112/psparse-536) >> [ 63.555079] Thermal: failed to read out thermal zone 0 >> [ 63.556361] CPU 1 is now offline >> [ 63.556944] PM: Restoring platform NVS memory >> [ 63.556944] Enabling non-boot CPUs ... >> [ 63.556944] Booting Node 0 Processor 1 APIC 0x1 >> [ 63.556279] Initializing CPU#1 >> ... >> >> which really doesn't tell me much, except that clearly something in >> ACPI-land is unhappy, and it looks thermal-related (that last error >> message comes from thermal_zone_device_update()). >> >> Any ideas? > > Does revert bba63a29(ACPICA: Implicit notify support) help? It seemed to, but on the third boot (with several suspend cycles per boot) I ended up seeing it. This is why I can't bisect it - it really isn't reliable enough to bisect sanely. One thing I've noticed: if the system suspends once, it seems to suspend several times. At least I think that every time I've seen this problem, it's happened on the first suspend (and if it comes back after a keypress, the second suspend will hang hard). But I've been booting this machine so much during all the testing, that I haven't ever done a really _long_ run of many suspend/resume cyles, so my evidence for that is weakish. It also seems to be easier to trigger under some moderate load. But that may be a total red herring, and maybe it's my expectations that color that impression (ie the fact that I saw the "thermal" thing, and started thinking that it happens more easily if I run some system stresser on it). Finally, it may well be that the thermal problem from ACPI is harmless - I've seen the suspend end up hanging on a keypress (but coming back) even without that particular message being in the logs. Zhang Rui wrote: > > is this a 2.6.38-rc1 regression? I'm fairly certain it is. I don't use that machine much (it's slow as molasses and the screen is not very good), and I did a full re-install of Fedora 14 on it just this Monday to take ti with me to LCA. But I have bisected three different suspend problems - not including this one - on this machine since then, so I have been booting lots of kernels on it, and suspending/resuming it a lot. And I have never seen a suspend/resume failure (so far - knock wood) when running plain 2.6.37. That said, it is NOT reliable. so I can't guarantee anything. Many of the suspends I've been doing have been by ssh'ing in, and just doing "echo mem > /sys/power/state". And I get the feeling that this problem happens more when I suspend by closing the lid. (Again, that "closing the lid" may be about me being physically at the machine and starting programs too, so my other theory that it is load-dependent might account for that). But it might well be some race, and have nothing to do with load or lid or anything, and just be entirely bad luck. The patterns I've seen are not strong enough to really make any judgement. > can you attach the acpidump output of this machine? Attached. I also attach the full dmesg of a boot and then later a successful suspend/resume. Linus
Attachment:
acpi.dump
Description: Binary data
Attachment:
dmesg
Description: Binary data
_______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/linux-pm