On Thu, Jun 6, 2024 at 3:07 PM Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> wrote: > > On 05/06/2024 21:17, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > > It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling > > device state to thermal_debug_cdev_add()") causes the ACPI fan driver > > to fail probing on some systems which turns out to be due to the _FST > > control method returning an invalid value until _FSL is first evaluated > > for the given fan. If this happens, the .get_cur_state() cooling device > > callback returns an error and __thermal_cooling_device_register() fails > > as uses that callback after commit 31a0fa0019b0. > > > > Arguably, _FST should not return an inavlid value even if it is > > evaluated before _FSL, so this may be regarded as a platform firmware > > issue, but at the same time it is not a good enough reason for failing > > the cooling device registration where the initial cooling device state > > is only needed to initialize a thermal debug facility. > > > > Accordingly, modify __thermal_cooling_device_register() to pass a > > negative state value to thermal_debug_cdev_add() instead of failing > > if the initial .get_cur_state() callback invocation fails and adjust > > the thermal debug code to ignore negative cooling device state values. > > > > Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") > > Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@xxxxxxxxxxxxx > > Reported-by: Laura Nao <laura.nao@xxxxxxxxxxxxx> > > Tested-by: Laura Nao <laura.nao@xxxxxxxxxxxxx> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > As it is a driver issue, it should be fixed in the driver, not in the > core code. The resulting code logic in the core is trying to deal with > bad driver behavior, it does not really seem appropriate. > > The core code has been clean up from the high friction it had with the > legacy ACPI code. It would be nice to continue it this direction. Essentially, you are saying that .get_cur_state() should not return an error even if it gets an utterly invalid value from the platform firmware. What value should it return then?