From: Guenter Roeck <groeck7@xxxxxxxxx> on behalf of Guenter Roeck <linux@xxxxxxxxxxxx> Sent: Tuesday, March 4, 2025 4:04 PM >On 3/4/25 06:10, Maud Spierings via B4 Relay wrote: >> From: Maud Spierings <maudspierings@xxxxxxxxxxxxxx> >> >> When the ntc is reading Out Of Bounds instead of clipping to the nearest >> limit (min/max) return -ENODATA. This prevents malfunctioning sensors >> from sending a device into a shutdown loop due to a critical trip. >> >> This implementation will only work for ntc type thermistors if a ptc >> type is to be implemented the min/max ohm calculation must be adjusted >> to take that into account. >> >> Signed-off-by: Maud Spierings <maudspierings@xxxxxxxxxxxxxx> >> --- >> This patch is a continuation of another discussion [1]. I felt like it >> should be a new patch, not a v2 as this is a very different change. thoughts about the theoretic code comment here. >> I have left the clamping of n to INT_MAX in the code with a comment, but >> it may be possible to find a better solution to it. One I thought of is >> to make the ohm field of the ntc_compensation struct a signed int as >> well. It would get rid of this weird edge case, but it doesn't make >> sense to allow for negative resistances to be entered into the sensor >> table. >> >> Currently the only feedback this provides to the user is when they >> manually try to read the temperature and it returns the error. I have >> added a simple printk to these error points to see how spammy it gets >> and this is the result: >> >> dmesg | grep hwmon >> [ 4.982682] hwmon: sensor out of bounds >> [ 5.249758] hwmon: sensor out of bounds >> [ 5.633729] hwmon: sensor out of bounds >> [ 6.215285] hwmon: sensor out of bounds >> [ 7.073882] hwmon: sensor out of bounds >> [ 7.486620] hwmon: sensor out of bounds >> [ 8.833765] hwmon: sensor out of bounds >> [ 10.785969] hwmon: sensor out of bounds >> [ 13.793722] hwmon: sensor out of bounds >> [ 16.761124] hwmon: sensor out of bounds >> [ 17.889706] hwmon: sensor out of bounds >> [ 25.057715] hwmon: sensor out of bounds >> [ 35.041725] hwmon: sensor out of bounds >> [ 50.110346] hwmon: sensor out of bounds >> [ 72.945283] hwmon: sensor out of bounds >> [ 105.712619] hwmon: sensor out of bounds >> [ 154.863976] hwmon: sensor out of bounds >> [ 164.937104] hwmon: sensor out of bounds >> [ 228.590909] hwmon: sensor out of bounds >> [ 315.365777] hwmon: sensor out of bounds >> [ 464.718403] hwmon: sensor out of bounds >> [ 615.079123] hwmon: sensor out of bounds >> [ 764.496780] hwmon: sensor out of bounds >> >> This is with polling-delay set to 1000, it seems to rate-limit itself? >> But I feel there should be a better way to communicate the potential >> sensor failure to the user, but I can't think of anything. >> > >Hmm ... we could add "fault" attributes and report a sensor fault >if uv == 0 or uv >= puv, together with the -ENODATA temperature reading. >That could distinguish the fault from the "resistor value out of range" case. I've done some working around that fault attribute, but I can't seem to find a satisfactory way to implement it. This fault also is not triggered if anything is disconnected, maybe the 0 case but I can't even get that. I think leaving it at this current solution is good for now. >> [1]: https://lore.kernel.org/all/20250304-ntc_min_max-v1-1-b08e70e56459@xxxxxxxxxxxxxx/ > >Most of the above should be after "---" since it is irrelevant for the commit log. I believe my cover letter seperated it correctly after my signed-off-by tag. >> --- >> drivers/hwmon/ntc_thermistor.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/hwmon/ntc_thermistor.c b/drivers/hwmon/ntc_thermistor.c >> index 0d29c8f97ba7c2f264588b6309b91ca494012ad6..311a60498d409ea068a18590415b2d5b43e73eb1 100644 >> --- a/drivers/hwmon/ntc_thermistor.c >> +++ b/drivers/hwmon/ntc_thermistor.c >> @@ -387,12 +387,9 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) >> puo = data->pullup_ohm; >> pdo = data->pulldown_ohm; >> >> - if (uv == 0) >> - return (data->connect == NTC_CONNECTED_POSITIVE) ? >> - INT_MAX : 0; >> - if (uv >= puv) >> - return (data->connect == NTC_CONNECTED_POSITIVE) ? >> - 0 : INT_MAX; >> + /* faulty adc value */ >> + if (uv == 0 || uv >= puv) >> + return -ENODATA; >> >> if (data->connect == NTC_CONNECTED_POSITIVE && puo == 0) >> n = div_u64(pdo * (puv - uv), uv); >> @@ -404,6 +401,16 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) >> else >> n = div64_u64_safe(pdo * puo * uv, pdo * (puv - uv) - puo * uv); >> >> + /* sensor out of bounds */ >> + if (n > data->comp[0].ohm || n < data->comp[data->n_comp-1].ohm) >> + return -ENODATA; >> + >> + /* >> + * theoretically data->comp[0].ohm can be greater than INT_MAX as it is an >> + * unsigned integer, but it doesn't make any sense for it to be so as the >> + * maximum return value of this function is INT_MAX, so it will never be >> + * able to properly calculate that temperature. >> + */ >> if (n > INT_MAX) >> n = INT_MAX; > >I am not a friend of theoretic code or comments like this. Please drop. >The original code was intended to catch out-of-bounds values which would >otherwise have been reported as error, not to catch unrealistic resistor >values in the compensation tables. So, drop the check and comment? Or just drop the comment? I was thinking to fully remove that check in an earlier comment in my cover letter. Kind regards, Maud