Re: Questions about adt7470 driver

Jean Delvare <jdelvare@xxxxxxx> · Fri, 29 May 2020 15:41:57 +0200

On Thu, 28 May 2020 17:18:58 -0700, Darrick J. Wong wrote:
> I vaguely remember that the adt7470 temperature inputs were connected to
> the CPU, and the PWM outputs were connected to the CPU heatsink fans.
> The BIOS appeared to set up the adt7470 for automatic thermal management
> (i.e. when you cranked all four cores of the machine to maximum) it
> would gradually raise the CPU fan speed, like you'd expect.
> 
> The reality (again, vaguely remembered) was that the chip wouldn't run
> its pwm control loop unless *something* poked it to reread the
> temperature sensors.  A different model of the same machine had a BMC
> which would talk to the adt7470 over i2c and take care of that.

That I understand, and while it is poor design in my opinion, it makes
sense to some degree.

> The other problem was that /some/ of the machines for whatever reason
> would adjust the pwm value that you could read out over i2c, but
> wouldn't actually change the fan speed unless you whacked the adt into
> manual modem.

Ah. That would be the reason for the extra code. Automatic fan speed
control that needs to be refreshed manually. Oh my.

> Neither of those two behaviors were listed in the datasheet, and we
> (IBM) could never get an answer out of either Analog or our own hardware
> group about whether or not this was the expected behavior.  I
> disassembled the BMC code to figure out what the other model computer
> was doing, and (clumsily) wrote that into the driver.  For all I know we
> got a bad batch of adt7470s and all these weird gymnastics aren't
> supposed to be necessary.
> 
> The next generation switched to a totally different chip and supplier,
> so I surmise they weren't happy with the results either.  Those machines
> tended to overheat if you were in Windows.
> 
> > > 4* Why are you calling msleep_interruptible() in
> > > adt7470_read_temperatures() to wait for the temperature conversions? We
> > > return -EAGAIN if that happens, but then ignore that error code, and we
> > > log a cryptic error message. Do I understand correctly that the only
> > > case where this should happen is when the user unloads the kernel
> > > driver, in which case we do not care about having been interrupted? I
> > > can't actually get the error message to be logged when rmmod'ing the
> > > module so I don't know what it would take to trigger it.  
> 
> Urrk, what a doof who wrote that.  /me smacks 2009-era djwong. :P
> 
> kthread_stop blocks until the thread exits...

My experiments seem to confirm this.

> but strangely we don't
> even try to interrupt the msleep_interruptible call.

How would we do that if we wanted to? Later you say this is not
possible?

> That's fine,
> though device removal will take longer than it needs to.

Yes, up to 2 seconds in my tests. Not pleasant, but also not
necessarily something to worry about, as rmmod is usually not needed.

> We also don't
> care about the return value of msleep_interruptible at all since one
> cannot interrupt the kthread.
> 
> I probably picked interruptible sleep to avoid triggering the hangcheck
> timer.

I don't understand that part. Is a 2 second uninteruptible sleep in a
kthread considered bad somehow?

> > > 5* Is there any reason why the update thread is being started
> > > unconditionally? As I understand it, it is only needed if at least one
> > > PWM output is configured in automatic mode, which (I think) is not the
> > > default. It is odd that the bug reporter hits a problem with the  
> 
> Yes, the driver should only start the kthread loop if someone wants
> automatic temp control.

OK, I'll give it a try. I don't want to add too much complexity though.

Thanks,
-- 
Jean Delvare
SUSE L3 Support