On 9/17/21 1:06 PM, Subbaraman Narayanamurthy wrote: > On 9/17/21 2:31 AM, Daniel Lezcano wrote: >> On 07/09/2021 21:01, Subbaraman Narayanamurthy wrote: >>> of_parse_thermal_zones() parses the thermal-zones node and registers a >>> thermal_zone device for each subnode. However, if a thermal zone is >>> consuming a thermal sensor and that thermal sensor device hasn't probed >>> yet, an attempt to set trip_point_*_temp for that thermal zone device >>> can cause a NULL pointer dereference. Fix it. >>> >>> console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp >>> ... >>> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 >>> ... >>> Call trace: >>> of_thermal_set_trip_temp+0x40/0xc4 >>> trip_point_temp_store+0xc0/0x1dc >>> dev_attr_store+0x38/0x88 >>> sysfs_kf_write+0x64/0xc0 >>> kernfs_fop_write_iter+0x108/0x1d0 >>> vfs_write+0x2f4/0x368 >>> ksys_write+0x7c/0xec >>> __arm64_sys_write+0x20/0x30 >>> el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc >>> do_el0_svc+0x28/0xa0 >>> el0_svc+0x14/0x24 >>> el0_sync_handler+0x88/0xec >>> el0_sync+0x1c0/0x200 >>> >>> While at it, fix the possible NULL pointer dereference in other >>> functions as well: of_thermal_get_temp(), of_thermal_set_emul_temp(), >>> of_thermal_get_trend(). >>> >>> Cc: stable@xxxxxxxxxxxxxxx >>> Suggested-by: David Collins <quic_collinsd@xxxxxxxxxxx> >>> Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram@xxxxxxxxxxx> >>> --- >>> Changes for v2: >>> - Added checks in of_thermal_get_temp(), of_thermal_set_emul_temp(), of_thermal_get_trend(). >>> >>> drivers/thermal/thermal_of.c | 9 ++++++--- >>> 1 file changed, 6 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c >>> index 6379f26..9233f7e 100644 >>> --- a/drivers/thermal/thermal_of.c >>> +++ b/drivers/thermal/thermal_of.c >>> @@ -89,7 +89,7 @@ static int of_thermal_get_temp(struct thermal_zone_device *tz, >>> { >>> struct __thermal_zone *data = tz->devdata; >>> >>> - if (!data->ops->get_temp) >>> + if (!data->ops || !data->ops->get_temp) >> comment (1) >> >> AFAICT, if data->ops != NULL then data->ops->get_temp is also != NULL >> because of the code allocating/freeing the ops structure. >> >> The tests can be replaced by (!data->ops), no ? > Thanks Daniel for reviewing the patch. > > I agree that even if a sensor module is unregistered, that would call > "thermal_zone_of_sensor_unregister" which would eventually set NULL on > get_temp() and get_trend() and "tzd->ops" as well. > > However, of_thermal_get_temp() is trying to call "data->ops->get_temp" > which comes from a sensor driver when it registers. There is no > guarantee that it would be non-NULL right? > > Thinking of which, I think having both checks would be valid. Hi Daniel, Do you still have any concerns with this change? Thanks, Subbaraman >>> return -EINVAL; >>> >>> return data->ops->get_temp(data->sensor_data, temp); >>> @@ -186,6 +186,9 @@ static int of_thermal_set_emul_temp(struct thermal_zone_device *tz, >>> { >>> struct __thermal_zone *data = tz->devdata; >>> >>> + if (!data->ops || !data->ops->set_emul_temp) >>> + return -EINVAL; >>> + >> comment (2) >> >> The test looks pointless here (I mean both of them). >> >> If of_thermal_set_emul_temp() is called it is because the callback was >> set in thermal_zone_of_add_sensor(). >> >> This one does: >> >> tz->ops = ops; >> >> and >> if (ops->set_emul_temp) >> tzd->ops->set_emul_temp = of_thermal_set_emul_temp; >> >> If I'm not wrong if we are called, then data->ops && >> data->ops->set_emul_temp is always true, right ? >> > I've not exercised this condition yet. However, the original problem > we've observed was when thermal HAL was trying to set trip thresholds > on a thermal zone device for which the sensor device is not probed yet. > This had happened randomly because of vendor modules taking time to be > loaded and probed. I don't know if there would be any userspace entity > that can try to set emulated temperature for a thermal zone even before > a sensor device is probed. > > Without a sensor driver probed, "tz->ops" would not have a valid pointer > right? So, I think checking for "data->ops" should be good. > > Another possibility is, a sensor might not have "set_emul_temp" callback. > So checking for "ops->set_emul_temp" should be still valid. > >>> return data->ops->set_emul_temp(data->sensor_data, temp); >>> } >>> >>> @@ -194,7 +197,7 @@ static int of_thermal_get_trend(struct thermal_zone_device *tz, int trip, >>> { >>> struct __thermal_zone *data = tz->devdata; >>> >>> - if (!data->ops->get_trend) >>> + if (!data->ops || !data->ops->get_trend) >>> return -EINVAL; >> Same comment as (1) > of_thermal_get_trend() is trying to call "data->ops->get_trend" which > comes from a sensor driver when it registers. From what I can see, > there are lot of drivers which don't pass "get_trend" in their ops. > So there is no guarantee that it would be non-NULL right? > > Thinking of which, I think having both checks would be valid. > >>> return data->ops->get_trend(data->sensor_data, trip, trend); >>> @@ -301,7 +304,7 @@ static int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip, >>> if (trip >= data->ntrips || trip < 0) >>> return -EDOM; >>> >>> - if (data->ops->set_trip_temp) { >>> + if (data->ops && data->ops->set_trip_temp) { >> Same comment as (2) > Without a sensor driver probed, "tz->ops" would not have a valid pointer right? > So, I think checking for "data->ops" should be good. Another possibility is, a > sensor device might not have "set_trip_temp" callback but just "set_trips". > > So checking for "data->ops->set_trip_temp" might be still valid. > >>> int ret; >>> >>> ret = data->ops->set_trip_temp(data->sensor_data, trip, temp); >>>