Re: [PATCHv4 1/1] Hwmon: Add core/pkg Threshold Support to Coretemp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 16 Sep 2011 10:48:55 -0700, Guenter Roeck wrote:
> On Fri, 2011-09-16 at 13:00 -0400, Jean Delvare wrote:
> > On Mon, 12 Sep 2011 18:18:00 +0200, Jean Delvare wrote:
> > > On my Core Duo T2600, the output looks like:
> > > 
> > > coretemp-isa-0000
> > > Core 0:      +54.0°C  (high = +47.0°C, hyst = +64.0°C)  ALARM
> > >                       (crit = +100.0°C)
> > > Core 1:      +55.0°C  (high = +47.0°C, hyst = +64.0°C)  ALARM
> > >                       (crit = +100.0°C)
> > > 
> > > High at 47°C by default seems unreasonable, and hyst > high even more
> > > so. But at least the ALARM flags are consistent with these limits.
> > 
> > BTW, the high and hyst values on this CPU appear to be random. Today
> > they are 89°C and 96°C. Very odd.
> > 
> Not initialized, maybe ? Still odd, though.

Worse than this. Reloading the driver changes the values.

I think I finally understood what is going on. The threshold values are
adjusted dynamically based on the measured temperature. This is on
kernel 2.6.32 where the kernel has no clue about these thresholds, so
the only possibility I can think of is that the BIOS is doing it. And
"hyst" is consistently higher than "high", which means that the BIOS has
decided on an opposite convention to what the coretemp driver is doing.

So our driver makes many assumptions which aren't verified in my case:

* The driver shouldn't assume that the threshold values are under his
  sole control. Reading the values once at initialization time and
  never again after that is not correct.

* The driver assumes that threshold0 is higher than threshold1. Looking
  at the SDM, there is no such asymmetry, both thresholds are
  equivalent. So my laptop's BIOS is in its own right when deciding that
  threshold1 is high and threshold0 is low. Given that 0 < 1, their
  decision makes even more sense than ours. It's an IBM/Lenovo Thinkpad
  T60p, a pretty popular series, so we can't just ignore this problem.
  A lot of users will be affected.

* The driver artificially binds the two thresholds by making one the
  _hyst of the other. I see no such relation in the datasheet though,
  both thresholds appear to be completely independent. I know that this
  wasn't Durgadoss' original implementation and we had him change to
  that, but retrospectively this seems to have been a mistake.

I presume that my BIOS leverages the interrupts associated with the
thresholds to do dynamic thermal management, either by fan speed control
or by CPU throttling, or anything else, or a mix of all these.

Durgadoss, please speak up if anything I wrote above isn't correct.

This brings up a question I asked before but never got an answer to,
and it seems I can't find the answer in the SDM either: where are the
interrupts going? Are these by any chance SMIs which the kernel has no
way to deal with?

The first 2 wrong assumptions listed above can easily get fixed. First
one is fixed by always reading the values from the MSR instead of the
cache. Second one is fixed by testing the threshold values at
initialization time to determine which direction the BIOS went with
(might be racy though.)

The last assumption however seems very difficult to fix. It would be
valid to use one of the thresholds as a real low limit (e.g. to enable
a heating system if the system is about to freeze, or more
realistically, to enable turbo mode on low temperatures). In a way
that's what my laptop's BIOS is doing, although the threshold value and
presumably its effect change dynamically.

The fact that each threshold can be used for anything makes it very
difficult to make them fit in our standard hwmon interface. On one
machine the BIOS may expect the temperature to be below both thresholds
when the system is idle, while on others it will expect that the
current temperature is between the thresholds (as is the case on my
laptop.) This means that there is no unique semantics attached to these
thresholds, while our standard interface wants semantics attached
always.

I admit I am not sure how to deal with all this. Suggestions are
welcome. What I'm sure of is that we don't want to let the coretemp
driver in the state it currently is... We will get a flood of user
complaints or at least questions if we do.

-- 
Jean Delvare

_______________________________________________
lm-sensors mailing list
lm-sensors@xxxxxxxxxxxxxx
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors



[Index of Archives]     [Linux Kernel]     [Linux Hardware Monitoring]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux