Re: Questions about adt7470 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 28, 2020 at 06:52:37AM -0700, Guenter Roeck wrote:
> On 5/28/20 3:02 AM, Jean Delvare wrote:
> > On Wed, 27 May 2020 16:33:34 -0700, Darrick J. Wong wrote:
> >> On Wed, May 27, 2020 at 03:42:52PM -0700, Guenter Roeck wrote:
> >>> On Tue, May 26, 2020 at 11:22:59AM +0200, Jean Delvare wrote:  
> >>>> Hi all,
> >>>>
> >>>> In the context of bug #207771, I got to look into the adt7470 driver.
> >>>> I'm slowing understanding the logic of the background temperature
> >>>> registers update thread, still there are 2 things I do not understand:
> >>>>
> >>>> 1* Function adt7470_read_temperatures() sets data->num_temp_sensors,
> >>>> however this value seems to be only used to limit the wait time of
> >>>> future calls to the same function. In the general update function we
> >>>> still read ALL temperature sensors regardless of its value:
> >>>>
> >>>> 		for (i = 0; i < ADT7470_TEMP_COUNT; i++)
> >>>> 			data->temp[i] = i2c_smbus_read_byte_data(client,
> >>>> 						ADT7470_TEMP_REG(i));
> >>>>
> >>>> Shouldn't this loop be bounded with data->num_temp_sensors instead of
> >>>> ADT7470_TEMP_COUNT?
> >>>>   
> >>> Guess so.
> >>>   
> >>>> 2* Function adt7470_read_temperatures() also sets
> >>>> data->temperatures_probed to 1, and this boolean is then used to skip
> >>>> further calls to that function. But do we really need a separate
> >>>> variable for this, given that num_temp_sensors >= 0 matches the same
> >>>> condition as far as I can see?
> >>>
> >>> Agreed. On the other side, those are optimizations. I am not really sure
> >>> if that driver is worth spending time on, given the age of the chip.
> > 
> > Well it's still in use and apparently at least one user cares enough to
> > report a bug and propose a candidate fix.
> > 
> ... but at the same time that user doesn't have any temperature sensors
> installed and won't be able to test any changes.
> 
> >> I /think/ the answer to both questions is yes, but I don't have the
> >> hardware anymore so I have no way to QA that... :/
> > 
> > Thanks for both of you for your answers.
> > 
> > Darrick, I have 3 more questions for you if you remember...
> > 
> > 3* I understand that the temperatures need to be read periodically in
> > order for automatic fan speed control to work. What I do not understand
> > is why you bother switching PWM outputs to manual mode each time? What
> > bad would happen if we did not do that? I see nothing in the datasheet
> > justifying it.

I don't recall the specifics very well, sorry in advance. :/

I vaguely remember that the adt7470 temperature inputs were connected to
the CPU, and the PWM outputs were connected to the CPU heatsink fans.
The BIOS appeared to set up the adt7470 for automatic thermal management
(i.e. when you cranked all four cores of the machine to maximum) it
would gradually raise the CPU fan speed, like you'd expect.

The reality (again, vaguely remembered) was that the chip wouldn't run
its pwm control loop unless *something* poked it to reread the
temperature sensors.  A different model of the same machine had a BMC
which would talk to the adt7470 over i2c and take care of that.

The other problem was that /some/ of the machines for whatever reason
would adjust the pwm value that you could read out over i2c, but
wouldn't actually change the fan speed unless you whacked the adt into
manual modem.

Neither of those two behaviors were listed in the datasheet, and we
(IBM) could never get an answer out of either Analog or our own hardware
group about whether or not this was the expected behavior.  I
disassembled the BMC code to figure out what the other model computer
was doing, and (clumsily) wrote that into the driver.  For all I know we
got a bad batch of adt7470s and all these weird gymnastics aren't
supposed to be necessary.

The next generation switched to a totally different chip and supplier,
so I surmise they weren't happy with the results either.  Those machines
tended to overheat if you were in Windows.

> > 4* Why are you calling msleep_interruptible() in
> > adt7470_read_temperatures() to wait for the temperature conversions? We
> > return -EAGAIN if that happens, but then ignore that error code, and we
> > log a cryptic error message. Do I understand correctly that the only
> > case where this should happen is when the user unloads the kernel
> > driver, in which case we do not care about having been interrupted? I
> > can't actually get the error message to be logged when rmmod'ing the
> > module so I don't know what it would take to trigger it.

Urrk, what a doof who wrote that.  /me smacks 2009-era djwong. :P

kthread_stop blocks until the thread exits... but strangely we don't
even try to interrupt the msleep_interruptible call.  That's fine,
though device removal will take longer than it needs to.  We also don't
care about the return value of msleep_interruptible at all since one
cannot interrupt the kthread.

I probably picked interruptible sleep to avoid triggering the hangcheck
timer.

> Not sure if rmmod generates a signal. In theory you could possibly
> keep writing -1 into the num_temp_sensors attribute, execute the
> sensors command (or just read a temperature), and interrupt the
> sequence.
> 
> > 5* Is there any reason why the update thread is being started
> > unconditionally? As I understand it, it is only needed if at least one
> > PWM output is configured in automatic mode, which (I think) is not the
> > default. It is odd that the bug reporter hits a problem with the

Yes, the driver should only start the kthread loop if someone wants
automatic temp control.

> > polling thread when they are not using automatic fan speed control in
> > the first place so they do not need the polling thread to run. I was
> > wondering if it would be possible to start and stop the polling thread
> > depending on whether automatic PWM is enabled or not.
> >
> 
> 
> The datasheet says nothing about the need to run such a thread for
> automatic mode either. As I understand it, the chip is supposed to
> repeat temperature measurements automatically once configuration
> register 1 bit 7 is set. Of course that is difficult if not
> impossible to confirm without access to the chip.

IIRC my experiences with that chip were that the one I was using didn't
conform to the datasheet, i.e. the automatic temp management bit /was/
set by the BIOS, but the chip didn't do adjust the pwm control until
something came along and poked the adt7470 to re-query the temperature
sensors.  Unfortunately it was being used to control the CPU fan on a
socket 771 Xeon, which meant that we really /did/ need it to cycle up
the fans when the CPU went to full load.

Needless to say I don't have that Xeon 5080 system anymore. :/

--D

> Guenter



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux