RE: [PATCH] hwmon: (k10temp) Report negative temperatures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - General]

To not spawn any new problems, we can go ahead with option 2.  i.e., "do not apply it to processors which are known to _not_ be affected by the problem."

Thanks
- Baski

-----Original Message-----
From: Guenter Roeck <groeck7@xxxxxxxxx> On Behalf Of Guenter Roeck
Sent: Thursday, June 8, 2023 1:03 PM
To: Kannan, Baski <Baski.Kannan@xxxxxxx>
Cc: Moger, Babu <Babu.Moger@xxxxxxx>; clemens@xxxxxxxxxx; jdelvare@xxxxxxxx; linux-hwmon@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Ramayanam, Pavan <Pavan.Ramayanam@xxxxxxx>
Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On 6/8/23 10:09, Kannan, Baski wrote:
> [AMD Official Use Only - General]
>
> The patch you have mentioned, aef17ca12719, sounds like a work-around for a problem found in some Ryzen Threadripper processors.
> If I understand correctly, this work-around (aef17ca12719) has been provided as a blanket fix for all the processors.
>

Due to lack of better knowledge and understanding, yes. See https://github.com/lm-sensors/lm-sensors/issues/70. That doesn't mean that a blanket revert would be appropriate.

> The Industrial Processor in question is the Epyc3k i3255.
> AMD Family 17h (boot_cpu_data.x86)
> AMD model 00h - 0fh (boot_cpu_data.x86_model) Model Name - contains
> string "3255"
>
> It supports temperature ranging from -40 degree Celsius to 105 deg Celsius.
> We have customers' machines running at -20 deg Celsius. They require that the correct temperature be passed to their tools.
>

We have two options: Either limit the workaround to the list of processors which may be affected by the original problem, or do not apply it to processors which are known to _not_ be affected by the problem. Either can easily be implemented by adding a flag to struct k10temp_data and setting it in the probe function.

No one outside AMD knows which processors may or may not be affected by the original problem. It was reported on 1950X at the time, but it may exist on all processors with the ability to set Sense MI Skew (and possibly Sense MI Offset), whatever that is. With that in mind, the fix will have to be provided by AMD.

Guenter

> -----Original Message-----
> From: Guenter Roeck <groeck7@xxxxxxxxx> On Behalf Of Guenter Roeck
> Sent: Thursday, June 8, 2023 8:52 AM
> To: Kannan, Baski <Baski.Kannan@xxxxxxx>
> Cc: Moger, Babu <Babu.Moger@xxxxxxx>; clemens@xxxxxxxxxx;
> jdelvare@xxxxxxxx; linux-hwmon@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] hwmon: (k10temp) Report negative temperatures
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Tue, May 23, 2023 at 02:46:46PM -0700, Guenter Roeck wrote:
>> On Tue, May 23, 2023 at 03:49:32PM -0500, Baskaran Kannan wrote:
>>> Currently, the tctl and die temperatures are rounded off to zero if
>>> they are less than 0. There are industrial processors which work
>>> below zero.
>>
>> This was introduced with commit aef17ca12719 ("hwmon: (k10temp) Only
>> apply temperature offset if result is positive"). This patch would
>> effecively revert that change. Given the reason for introducing it I
>> am not convinced that it is a good idea to unconditionally revert it.
>>
>
> Any comments ? I am not inclined to accept this patch as-is. What are the industrial processors ? Is there a means to detect them ?
>
> Guenter
>
>> Guenter
>>
>>>
>>> To display the correct temperature remove the rounding off.
>>>
>>> Signed-off-by: Baskaran Kannan <Baski.Kannan@xxxxxxx>
>>> ---
>>>   drivers/hwmon/k10temp.c | 4 ----
>>>   1 file changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index
>>> 7b177b9fbb09..489ad0b1bc74 100644
>>> --- a/drivers/hwmon/k10temp.c
>>> +++ b/drivers/hwmon/k10temp.c
>>> @@ -204,13 +204,9 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
>>>              switch (channel) {
>>>              case 0:         /* Tctl */
>>>                      *val = get_raw_temp(data);
>>> -                   if (*val < 0)
>>> -                           *val = 0;
>>>                      break;
>>>              case 1:         /* Tdie */
>>>                      *val = get_raw_temp(data) - data->temp_offset;
>>> -                   if (*val < 0)
>>> -                           *val = 0;
>>>                      break;
>>>              case 2 ... 13:          /* Tccd{1-12} */
>>>
>>> amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
>>> --
>>> 2.25.1
>>>





[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux