On Tue, Aug 09, 2022 at 02:28:20PM PDT, Zoltán Kővágó wrote:
On 2022-08-09 22:56, Zev Weiss wrote:
On Tue, Aug 09, 2022 at 01:27:48PM PDT, Zoltán Kővágó wrote:
Hi,
[1.] One line summary of the problem:
NCT6775: suspend doesn't work after updating to Linux 5.19
[2.] Full description of the problem/report:
After updating my kernel from 5.18.11 to 5.19, I've noticed that
resuming after suspend no longer works: fans start up, then about
a second later, the computer just shuts down, leaving the USB
ports powered up (normally it turns them off on shutdown). The
screens don't turn on during this timeframe, so I can't see any
useful log messages.
Bisecting between 5.18 (where it still worked) and 5.19 lead me to
commit c3963bc0a0cf9ecb205a9d4976eb92b6df2fa3fd "hwmon: (nct6775)
Split core and platform driver" which looks like a refactor
commit, but apparently it broke something.
Hi Zoltán,
Thanks for the thorough bug report. You're right that that commit
was essentially just a refactor, though there was one slight change
to the nct6775_suspend() function introduced during the review
process that may perhaps have had some subtle unintended
side-effects.
Can you test the following patch and report if it resolves the problem?
Thanks,
Zev
Hi Zev,
Thanks for the quick reply. Yes, it looks like your patch does solve
the problem (I've applied it on top of 5.19 (after fighting with my
mail client for a while) and suspended a few times, it's working so
far).
Regards,
Zoltan
Great, thanks.
Guenter, it looks like nct6775_suspend() really does in fact need to use
nct6775_update_device() instead of dev_get_drvdata(), though it's not
immediately obvious to me why. Though given that the bulk of of the
body of nct6775_update_device() is inside an 'if' block that might not
necessarily execute every time, I also wonder if it might be vulnerable
to exhibiting the same problem depending on timing.
Zoltán, if you could try another experiment to try to gather some data
on that -- with the patch from my previous email still applied, could
you try suspending via:
$ cat /sys/bus/platform/drivers/nct6775/nct6775.*/hwmon/hwmon*/*_input && echo mem > /sys/power/state
and see if suspend is broken again? The idea being to read from the
hwmon device's sensors and then immediately suspend so that the
nct6775_update_device() call from nct6775_suspend() falls within the
driver's 1.5 second cache window and hence skips most of the work that
function does. If the same problem starts occurring again that way, it
seems we've had essentially the same bug lurking for a while and the
change in the patch you bisected to just made it happen consistently
instead of unpredictably. If suspend/resume continues working however,
then I'm quite mystified, because the only other thing that's happening
in that function is a mutex lock/unlock.
Thanks,
Zev