On Tue, Nov 22, 2016 at 03:52:25PM +0800, Zhang Rui wrote: > Hi, Brian, > > On Fri, 2016-11-18 at 21:30 -0800, Brian Norris wrote: > > Hi, > > > > On Fri, Nov 18, 2016 at 07:41:59PM -0800, Eduardo Valentin wrote: > > > > > > On Fri, Nov 18, 2016 at 03:52:55PM -0800, Brian Norris wrote: > > > > > > > > If using CONFIG_THERMAL_EMULATION, there's a corner case where we > > > > might > > > > get an error from the zone's get_temp() callback, but we'll > > > > ignore that > > > > and keep using its value. Let's just error out properly instead. > > > > > > > > Signed-off-by: Brian Norris <briannorris at chromium.org> > > > > --- > > > > ?drivers/thermal/thermal_core.c | 3 +++ > > > > ?1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > > b/drivers/thermal/thermal_core.c > > > > index 911fd964c742..0fa497f10d25 100644 > > > > --- a/drivers/thermal/thermal_core.c > > > > +++ b/drivers/thermal/thermal_core.c > > > > @@ -494,6 +494,8 @@ int thermal_zone_get_temp(struct > > > > thermal_zone_device *tz, int *temp) > > > > ? mutex_lock(&tz->lock); > > > > ? > > > > ? ret = tz->ops->get_temp(tz, temp); > > > > + if (ret) > > > > + goto exit_unlock; > > > Yeah, but the follow through is intentional, if I am not mistaken. > > OK...but it has a bug. It potentially utilizes an uninitialized value > > for *temp. > > > Agreed. I also agree that this section of current get_temp is buggy. That is why I sent the patch some time ago. > > > > > > > > > > > ? > > > > ? if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz- > > > > >emul_temperature) { > > > Even if the driver is not able to read real temperature, but emul > > > temp > > > is configured, then there is still opportunity to report the > > > emulated > > > temperature. > > OK, maybe, but you should avoid doing this comparison then: > > > > 513?????????????????if (!ret && *temp < crit_temp) > > 514?????????????????????????*temp = tz->emul_temperature; > > > > Note that 'ret' might be 0 (from the calls to ->get_trip_type()), and > > then > > you're comparing with the uninitialized value of *temp. So you need > > some > > solution that accounts for this and decides to ignore the real > > temperature properly. > > > right. > > > > > > > > > > > ? for (count = 0; count < tz->trips; count++) { > > > > @@ -514,6 +516,7 @@ int thermal_zone_get_temp(struct > > > > thermal_zone_device *tz, int *temp) > > > > ? *temp = tz->emul_temperature; > > > And if you check the lines at the bottom of the loop, you will see > > > that, > > > in the fail case, we will stil compare to what is the content of > > > temp, > > > which might be problematic. > > Yes...are you saying the same thing I am above? Yes, Brian, we are concerned about the same bug. > > > > > > > > I would prefer we consider the patch I sent > > > some time ago: > > > https://patchwork.kernel.org/patch/7876381/ > > Honestly I didn't look that deeply into the framework here (and I > > also > > don't use CONFIG_THERMAL_EMULATION), I was just fixing something that > > was obviously wrong. Yeah, but that is why we need people to look the code considering all features. :-) > > > > But on first read, that patch looks good to me -- although it'd be > > good > > to note the uninitialized value fix in the comit log. Any reason that > > didn't end up getting merged? It looks like it got reviewed, and > > you're > > a thermal subsystem maintainer... > > I do not remember why Rui postponed it. A note of clarification, for things that touch thermal core, I agree with Rui that they go through his tree. Besides, I tend to avoid acking and sending my own patches without proper review, which was not the case of that patch, that was just postponed and fell into the cracks somehow. > hmmm, I forgot why I missed this one in the end. > Eduardo, > would you mind refresh and resend the patch? Yeah sure. I have at least three extra patch sets on thermal core on my queue. But I would like to get first the thermal sysfs reorg in first. This fix is one of the changes that will go on top of the thermal sysfs reorg. BR, Eduardo