ThinkPad: random failure of ACPI thermal zone THM0 on resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just debugged a long lasting problem of my ThinkPad X1 Carbon 4th, model 20FB002UMC: There is a random problem with the 100% fan speed on resume. After the resume, the fan slowly raises its speed to the maximum. Reboot does not fix this problem, shutdown and power-on does. But after the boot and suspend, the problem randomly appears again.

The reproducibility varies on unknown conditions between ~5% to ~99%. Looking at the mailing lists, this is a common problem of many users with different ThinkPad models and different kernels. It appeared at least 3 years ago.

I made sys+proc dumps of the correct and the failed state, and narrowed the problem to the failed THM0 readout:

Resume with 100% fan speed gives following readouts:
x1:~ # cat /proc/acpi/ibm/thermal
temperatures:    -128 -128 0 0 0 0 0 0

But later the resume returned with a properly working fan:
x1:~ # cat /proc/acpi/ibm/thermal
temperatures:    44 -128 0 0 0 0 0 0

It implies that ACPI automatic fan regulation reads bad temperature and raises the fan speed to the maximum.

Watching dmesg of the failed and succeeded state, I see no significant differences. The suspend/resume log is not always consistent, but the standard log looks like, independently on fan regulation success/failure:

x1:~ # dmesg -c | grep -i '\(acpi\|thermal\|ibm\|thinkpad\)'
[12198.794282] ACPI: EC: interrupt blocked
[12198.833773] ACPI: Preparing to enter system sleep state S3
[12198.839765] ACPI: EC: event blocked
[12198.839767] ACPI: EC: EC stopped
[12198.849325] ACPI: Low-level resume complete
[12198.849414] ACPI: EC: EC started
[12198.854035] ACPI: Waking up from system sleep state S3
[12198.866982] ACPI: EC: interrupt unblocked
[12198.936739] ACPI: EC: event unblocked
[12199.080732] thinkpad_acpi: docked into hotplug port replicator

If I compare the whole suspend/resume logs, there are small differences in particular kernel logs, but again, I found no significant difference. If I exclude USB and network devices, I see changes only here:
bad to good:
-IRQ 16: no longer affine to CPU3
 IRQ 122: no longer affine to CPU3
-IRQ 124: no longer affine to CPU3
+IRQ 123: no longer affine to CPU3
But another bad to good diff looks differently:
IRQ 122: no longer affine to CPU3
 IRQ 123: no longer affine to CPU3
 IRQ 131: no longer affine to CPU3
+IRQ 137: no longer affine to CPU3
So even this does not give any indication of the error source. (This is comparison from openSUSE Leap 15.3 kernel 5.3.18-59.5, as I did not get succeeded fan resume on the recent 5.13~rc7-1.1.g0a4a430 yet.)

Everything looks like a race condition in the ACPI on resume, but as far as logs show, there is no difference between succeeded and failed state. Does anybody have any ideas/patches/additional debug messages?


Reproducibility conditions:

Reproduced on 5.13~rc7-1.1.g0a4a430 with (maybe) 100%. Some older kernels or builds have a lower reproduction rate, so I downgraded to openSUSE Leap 15.3's kernel 5.3.18-59.5, which allows to get succeeded resume log.

The problem appears only for automatic fan speed regulation. If I turn off automatic (ACPI regulated) fan speed, fan does exactly what expected, if I return automatic mode, fan raises its speed again.

The problem was never seen in Windows. But the problem appears, even if I set acpi_os_name and acpi_osi to Windows. (Note that I do not know whether Windows uses automatic fan speed regulated by ACPI.)

The problem appears both with and without dock, with and without peripherals attached.

Here is a complete boot log from 5.13~rc7-1.1.g0a4a430:
https://drive.google.com/file/d/1-Ijs9Z-fg6LQqjH6iHMpyuEyOWjtWrbs/view?usp=sharing
I failed to get a succeeded fan resume with this kernel yet.

--
Best Regards / S pozdravem,

Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o.                         e-mail: sbrabec@xxxxxxxx
Křižíkova 148/34 (Corso IIa)                    tel: +420 284 084 060
186 00 Praha 8-Karlín                          fax:  +420 284 084 001
Czech Republic                                    http://www.suse.cz/
PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76



_______________________________________________
ibm-acpi-devel mailing list
ibm-acpi-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/ibm-acpi-devel




[Index of Archives]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Photo]     [Yosemite Photos]     [Yosemite Advice]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux