On Mon, Jun 3, 2024 at 8:20 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > Hi, > > On Fri, May 31, 2024 at 1:35 PM Laura Nao <laura.nao@xxxxxxxxxxxxx> wrote: > > > > Hello, > > > > On 5/30/24 17:37, Laura Nao wrote: > > > Hello, > > > > > > We have identified a regression in the acpi-fan driver probe between > > > v6.9-rc7 and v6.10-rc1 on some Intel Chromebooks in the Collabora LAVA > > > lab. > > > > > > For the Acer Chromebook Spin 514 (CP514-2H), the following error is > > > reported in the logs: > > > > > > [ 0.651202] acpi-fan INTC1044:00: probe with driver acpi-fan failed with error -22 > > > > > > Similar errors are reported on other devices with fans compatible with > > > the same driver. > > > > > > On Acer Chromebox CXI4, ASUS Chromebook Flip C436FA and > > > HP Chromebook x360 14 G1: > > > > > > [ 0.488001] acpi-fan INT3404:00: probe with driver acpi-fan failed with error -22 > > > > > > On ASUS Chromebook Vero 514 CBV514-1H: > > > > > > [ 1.168905] acpi-fan INTC1048:00: probe with driver acpi-fan failed with error -22 > > > > > > The issue is still present on next-20240529. > > > > > > I'm sending this report to track the regression while a fix is > > > identified. I'll investigate the issue/run a bisection and report back > > > with the results. > > > > > > This regression was discovered during some preliminary tests with the > > > ACPI probe kselftest [1] in KernelCI. The config used was the upstream > > > x86_64 defconfig with a fragment applied on top [2]. > > > > > > Best, > > > > > > Laura > > > > > > [1] https://lore.kernel.org/all/20240308144933.337107-1-laura.nao@xxxxxxxxxxxxx/ > > > [2] https://pastebin.com/raw/0tFM0Zyg > > > > > > #regzbot introduced: v6.9-rc7..v6.10-rc1 > > > > The issue started happening after: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/thermal/thermal_core.c?h=v6.10-rc1&id=31a0fa0019b022024cc082ae292951a596b06f8c > > > > Before this commit, get_cur_state() was not called by > > __thermal_cooling_device_register, so the error was not triggered. > > > > After enabling debugging for the acpi-fan driver, I noticed these errors > > in the logs: > > > > [ 0.682224] acpi INTC1044:00: Invalid control value returned > > [ 0.682635] acpi INTC1044:00: Invalid control value returned > > > > The value stored in fst.control is 255, which is indeed not a valid > > value. > > > > I suspect this might be a firmware issue that is now manifesting due to > > the addition of the extra get_cur_state() call. > > > > I'll dig a bit more and report back. > > It looks like _FST returns all ones if it is evaluated before _FSL on > the affected platforms. > > It shouldn't do that, but then it is not particularly useful to fail > cdev registration for this reason. > > The attached patch should work around this issue, please try it and report back. A !ret check is missing in that patch, so please try the attached new version of it instead. Thanks!
--- drivers/thermal/thermal_core.c | 11 +++++++---- drivers/thermal/thermal_debugfs.c | 7 ++++++- 2 files changed, 13 insertions(+), 5 deletions(-) Index: linux-pm/drivers/thermal/thermal_core.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_core.c +++ linux-pm/drivers/thermal/thermal_core.c @@ -964,7 +964,8 @@ __thermal_cooling_device_register(struct { struct thermal_cooling_device *cdev; struct thermal_zone_device *pos = NULL; - unsigned long current_state; + unsigned long val; + int current_state; int id, ret; if (!ops || !ops->get_max_state || !ops->get_cur_state || @@ -1002,9 +1003,11 @@ __thermal_cooling_device_register(struct if (ret) goto out_cdev_type; - ret = cdev->ops->get_cur_state(cdev, ¤t_state); - if (ret) - goto out_cdev_type; + ret = cdev->ops->get_cur_state(cdev, &val); + if (!ret && val >= 0 && val <= INT_MAX) + current_state = val; + else + current_state = -1; thermal_cooling_device_setup_sysfs(cdev); Index: linux-pm/drivers/thermal/thermal_debugfs.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_debugfs.c +++ linux-pm/drivers/thermal/thermal_debugfs.c @@ -421,6 +421,8 @@ void thermal_debug_cdev_state_update(con cdev_dbg = &thermal_dbg->cdev_dbg; old_state = cdev_dbg->current_state; + if (old_state < 0) + goto unlock; /* * Get the old state information in the durations list. If @@ -463,6 +465,7 @@ void thermal_debug_cdev_state_update(con cdev_dbg->total++; +unlock: mutex_unlock(&thermal_dbg->lock); } @@ -499,7 +502,9 @@ void thermal_debug_cdev_add(struct therm * duration will be printed by cdev_dt_seq_show() as expected if it * runs before the first state transition. */ - thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations, state); + if (state >= 0) + thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations, + state); debugfs_create_file("trans_table", 0400, thermal_dbg->d_top, thermal_dbg, &tt_fops);