Re: Core2Quad and very hight temperature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hallo Markus,

On Wed, 26 Sep 2012 12:41:52 +0200 (CEST), rupprecht-admin2, Markus wrote:
> I'm the admin of a school. Our Server ist a Server has a Intel DG33FB Mainboard
> (it's Intel G33) with a Intel Core2Quad processor.
> 
> We are running Debian Lenny 2.6.35.9
> 
> While doing a backup with clonezilla there was the message, that the cpu
> temperature is too hot and that the cpu will work slower.
> After restarting the server I draw the four cores in kde systemmonitor (in
> german it's calles KDE Systemueberwachung).
> It shows -0,77 and -0,77 and  100 and 95 °C. I guess the decimal number is above
> 100°C.

It was more likely an error code which wasn't caught by the software.
Given the output of "sensors" below, this seems more plausible. The
coretemp driver can't report a temperature value above the critical
limit, it is technically not possible.

> Now I'm unsure, if this could be right.
> 
> So I did sensors-detect
> 12:26/0 server ~ # sensors-detect
> # sensors-detect revision 5249 (2008-05-11 22:56:25 +0200)

You do realize this is 4.5 year old, right? Not as old as your board,
but still, using a more recent version may help:
  http://dl.lm-sensors.org/lm-sensors/files/sensors-detect

(Seems to be down right now, you'll have to check later.)

> Now follows a summary of the probes I have just done.
> Just press ENTER to continue:
> 
> Driver `coretemp' (should be inserted):
>   Detects correctly:
>   * Chip `Intel Core family thermal sensor' (confidence: 9)
> (...)
> I checked: coretemp is build in.
> I googled "PC8374L" and fount that i have to use lm85.

You would if the chip had monitoring enabled, but that's not the case.
So you can forget about the lm85 driver. This is unfortunate because
this would have given us a point of comparison. On several Intel boards
of that era, monitoring was implemented in a way Linux doesn't support.

Does the BIOS display temperatures and/or other monitoring values?

> 12:28/1 server ~ # sensors
> coretemp-isa-0000
> Adapter: ISA adapter
> ERROR: Can't get value of subfeature temp1_input: Can't read
> Core 0:       +0.0 C  (high = +84.0 C, crit = +100.0 C)  ALARM
> 
> coretemp-isa-0001
> Adapter: ISA adapter
> Core 2:     +100.0 C  (high = +84.0 C, crit = +100.0 C)  ALARM
> 
> coretemp-isa-0002
> Adapter: ISA adapter
> ERROR: Can't get value of subfeature temp1_input: Can't read
> Core 1:       +0.0 C  (high = +84.0 C, crit = +100.0 C)  ALARM
> 
> coretemp-isa-0003
> Adapter: ISA adapter
> Core 3:      +96.0 C  (high = +84.0 C, crit = +100.0 C)

The errors for cores 0 and 1 are worrisome. We've seen these a couple
times in the past, but could never explain them nor fix them.

> Can this be true? On the cpu is a very huge cooler with heatpipes and a large
> fan. When I touch it, it is not hot. It seams to be mounted ok.

You did not tell us what exact CPU model your machine has. Different
models can have very different max TDP values.

The fact that the heatsink is not hot isn't necessarily a good thing.
The heat is generated by the CPU and is then expected to dissipate to
the heatsink, where the fan will extract it, and if the case is
properly designed, the heat goes outside of the system.

A cold heatsink can mean that the fan is doing a very good job. But it
can also mean that the dissipation from the CPU to the heatsink doesn't
happen, either because insufficient/bad thermal paste, or because the
heatsink is improperly mounted.

The fact that you got error messages related to CPU throttling suggest
the problem is "real", i.e. not a coretemp driver issue. That being
said, the CPU throttling code is reading its values from the same
model-specific registers as the coretemp driver, so if these registers
are somehow busted in your CPU, both will misbehave.

You may want to give a try to the latest coretemp driver:
  http://khali.linux-fr.org/devel/misc/coretemp/
I'm not holding my breath though. Another thing worth trying is a live
DVD using a more recent kernel.

But I think that either you have a real overheating problem (check your
thermal paste and heatsink mounting) or your CPU got somehow damaged.

-- 
Jean Delvare
http://khali.linux-fr.org/wishlist.html

_______________________________________________
lm-sensors mailing list
lm-sensors@xxxxxxxxxxxxxx
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors



[Index of Archives]     [Linux Kernel]     [Linux Hardware Monitoring]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux