21.02.2022 19:02, Guenter Roeck пишет: > On 2/21/22 07:49, Jon Hunter wrote: >> >> On 21/02/2022 15:43, Guenter Roeck wrote: >> >> ... >> >>>> We observed a random null pointer deference crash somewhere in the >>>> thermal core (crash log below is not very helpful) when calling >>>> mutex_lock(). It looks like we get an interrupt when this crash >>>> happens. >>>> >>>> Looking at the lm90 driver, per the above, I now see we are calling >>>> hwmon_notify_event() from the lm90 interrupt handler. Looking at >>>> hwmon_notify_event() I see that ... >>>> >>>> hwmon_notify_event() >>>> --> hwmon_thermal_notify() >>>> --> thermal_zone_device_update() >>>> --> update_temperature() >>>> --> mutex_lock() >>>> >>>> So although I don't completely understand the crash, it does seem >>>> that we should not be calling hwmon_notify_event() from the >>>> interrupt handler. >>>> >>> As mentioned separately, this is not the problem. >> >> Yes I can see that now. >> >>> I think the problem may be that this is not a devicetree system >>> (or the lm90 devide does not have a devicetree node), but thermal >>> notification currently only works in such systems because the hwmon >>> subsystem uses the devicetree registration method. At the same time, >>> CONFIG_THERMAL_OF is obviously enabled. Unfortunately, the hwmon code >>> does not bail out in that situation due to another bug. >> >> The platform I see this on does use device-tree and it does have a >> node for the ti,tmp451 device which uses the lm90 device. This >> platform uses the device-tree source >> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts and the tmp451 node >> is in arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi. >> > > Interesting. It appears that the call to > devm_thermal_zone_of_sensor_register() > in the hwmon core nevertheless returns -ENODEV which is not handled > properly > in the hwmon core. I can see a number of reasons for this to happen: > - there is no devicetree node for the lm90 device > - there is no thermal-zones devicetree node > - there is no thermal zone entry in the thermal-zones node which matches > the sensor > > We'll have to revert the lm90 changes until this is sorted out. Oh, yeah. Seems there is a problem there and tzd pointer could be -ENODEV. But it's a hwmon core problem, which apparently existed for a long time, not the lm90 problem.