Mondo: problem with as99127f-i2c-2d

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > I've discovered another bug, this one really problematic.  I
> > set up my mondo.conf file and then after a while the system
> > shut down claiming the CPU had overheated.  But I was sitting
> > right there and got straight into the BIOS hardware monitor
> > and it showed the temp was ok. So I changed the mondo.conf to
> > just emit warnings instead of shutting down.  After an hour or
> > so it claimed the fan was running too slow, but again, I was
> > sitting right there, and within seconds had run sensors, which
> > showed the fan speed to be ok.
> >
> >
> > Then I looked at /var/log/messages, and it said that the temp
> > when it had overheaded had been 255.40 and the fan speed
> > had been 0.0.   That's obviously nonsense.  I think maybe
> > what's happening here is that mondo is reading the proc files
> > just as they're being updated, and so reads an invalid value.
> > Is there any file locking employed?  Have you seen this
> > before?
> 
> Yikes!  Now that's something I've never seen before...
> 

I followed Kelledin's advice and ran mondo with logging turned 
on for a long time.  After about 1.5 hours it logged 3 cases
of fan1 at 0.0, these were single instances roughly 4-6 minutes
apart.  I also had mondo do:

  sensors | logger

on fan warnings and these showed that sensors also read 
the fan as 0.0 at that point in time.   I have not yet seen the 
CPU temp glitch again.

In any case, this indicates that the problem I'm seeing is not
in mondo, but is a bug either in i2c/lm_sensors or conceivably,
at the bios or hardware level in the A7V266E motherboard.
However I don't think it's hardware - motherboard monitor
runs on these machines when they are booted to XP and I've
not seen any oddball shutdowns under that OS.  In one instance
there was an xlock authentication failure 4 seconds before
fan1 read 0.0.  Other than that, there was nothing obvious to indicate
a problem /var/log/messages.  Note though that xlock was running
continuously most of this time (due to the idle console), and
cycling through whichever programs it runs by default.  This is
xlockmore -4.17.2.

This presents somewhat of a problem since it means running mondo
in a protective mode will trigger unwanted shutdowns
at 1-2 hour intervals (more or less).  Unfortunately for this class
of software 99.99% correct doesn't quite do it.

Any suggestions?  This is with RH 7.3 and the current distribution
of I2c/lm_sensors.  The device is as99127f-i2c-0-2d. 

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



[Index of Archives]     [Linux Kernel]     [Linux Hardware Monitoring]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux