21.02.2022 15:56, Jon Hunter пишет: > > On 21/02/2022 12:36, Dmitry Osipenko wrote: >> 21.02.2022 15:01, Jon Hunter пишет: >>> Hi Dmitry, >>> >>> On 18/06/2021 22:54, Dmitry Osipenko wrote: >>>> Use hwmon_notify_event() to notify userspace and thermal core about >>>> temperature changes. >>>> >>>> Suggested-by: Guenter Roeck <linux@xxxxxxxxxxxx> >>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >>>> --- >>>> drivers/hwmon/lm90.c | 44 >>>> +++++++++++++++++++++++++++++++++----------- >>>> 1 file changed, 33 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c >>>> index 2e057fad05b4..e7b678a40b39 100644 >>>> --- a/drivers/hwmon/lm90.c >>>> +++ b/drivers/hwmon/lm90.c >>>> @@ -465,6 +465,7 @@ enum lm90_temp11_reg_index { >>>> struct lm90_data { >>>> struct i2c_client *client; >>>> + struct device *hwmon_dev; >>>> u32 channel_config[4]; >>>> struct hwmon_channel_info temp_info; >>>> const struct hwmon_channel_info *info[3]; >>>> @@ -1731,22 +1732,41 @@ static bool lm90_is_tripped(struct i2c_client >>>> *client, u16 *status) >>>> if ((st & (LM90_STATUS_LLOW | LM90_STATUS_LHIGH | >>>> LM90_STATUS_LTHRM)) || >>>> (st2 & MAX6696_STATUS2_LOT2)) >>>> - dev_warn(&client->dev, >>>> - "temp%d out of range, please check!\n", 1); >>>> + dev_dbg(&client->dev, >>>> + "temp%d out of range, please check!\n", 1); >>>> if ((st & (LM90_STATUS_RLOW | LM90_STATUS_RHIGH | >>>> LM90_STATUS_RTHRM)) || >>>> (st2 & MAX6696_STATUS2_ROT2)) >>>> - dev_warn(&client->dev, >>>> - "temp%d out of range, please check!\n", 2); >>>> + dev_dbg(&client->dev, >>>> + "temp%d out of range, please check!\n", 2); >>>> if (st & LM90_STATUS_ROPEN) >>>> - dev_warn(&client->dev, >>>> - "temp%d diode open, please check!\n", 2); >>>> + dev_dbg(&client->dev, >>>> + "temp%d diode open, please check!\n", 2); >>>> if (st2 & (MAX6696_STATUS2_R2LOW | MAX6696_STATUS2_R2HIGH | >>>> MAX6696_STATUS2_R2THRM | MAX6696_STATUS2_R2OT2)) >>>> - dev_warn(&client->dev, >>>> - "temp%d out of range, please check!\n", 3); >>>> + dev_dbg(&client->dev, >>>> + "temp%d out of range, please check!\n", 3); >>>> if (st2 & MAX6696_STATUS2_R2OPEN) >>>> - dev_warn(&client->dev, >>>> - "temp%d diode open, please check!\n", 3); >>>> + dev_dbg(&client->dev, >>>> + "temp%d diode open, please check!\n", 3); >>>> + >>>> + if (st & LM90_STATUS_LLOW) >>>> + hwmon_notify_event(data->hwmon_dev, hwmon_temp, >>>> + hwmon_temp_min, 0); >>>> + if (st & LM90_STATUS_RLOW) >>>> + hwmon_notify_event(data->hwmon_dev, hwmon_temp, >>>> + hwmon_temp_min, 1); >>>> + if (st2 & MAX6696_STATUS2_R2LOW) >>>> + hwmon_notify_event(data->hwmon_dev, hwmon_temp, >>>> + hwmon_temp_min, 2); >>>> + if (st & LM90_STATUS_LHIGH) >>>> + hwmon_notify_event(data->hwmon_dev, hwmon_temp, >>>> + hwmon_temp_max, 0); >>>> + if (st & LM90_STATUS_RHIGH) >>>> + hwmon_notify_event(data->hwmon_dev, hwmon_temp, >>>> + hwmon_temp_max, 1); >>>> + if (st2 & MAX6696_STATUS2_R2HIGH) >>>> + hwmon_notify_event(data->hwmon_dev, hwmon_temp, >>>> + hwmon_temp_max, 2); >>> >>> >>> We observed a random null pointer deference crash somewhere in the >>> thermal core (crash log below is not very helpful) when calling >>> mutex_lock(). It looks like we get an interrupt when this crash >>> happens. >>> >>> Looking at the lm90 driver, per the above, I now see we are calling >>> hwmon_notify_event() from the lm90 interrupt handler. Looking at >>> hwmon_notify_event() I see that ... >>> >>> hwmon_notify_event() >>> --> hwmon_thermal_notify() >>> --> thermal_zone_device_update() >>> --> update_temperature() >>> --> mutex_lock() >>> >>> So although I don't completely understand the crash, it does seem >>> that we should not be calling hwmon_notify_event() from the >>> interrupt handler. >>> >>> BTW I have not reproduced this myself yet, so I have just been >>> reviewing the code to try and understand this. >> >> Matt Merhar was experiencing a similar issue on T30 Ouya, but I never >> managed to reproduce it on Nexus 7 and Acer A500 tablets, and couldn't >> spot any problem in the code. IIRC, it was a NULL dereference of another >> pointer within that code. > > > OK. From looking at the above I don't think we can call > hwmon_notify_event() from an interrupt handler because this is going to > try and request a mutex. So we need to fix that. The interrupt is threaded, so it can take a mutex.