On Thu, 28 May 2020 17:18:58 -0700, Darrick J. Wong wrote: > I vaguely remember that the adt7470 temperature inputs were connected to > the CPU, and the PWM outputs were connected to the CPU heatsink fans. > The BIOS appeared to set up the adt7470 for automatic thermal management > (i.e. when you cranked all four cores of the machine to maximum) it > would gradually raise the CPU fan speed, like you'd expect. > > The reality (again, vaguely remembered) was that the chip wouldn't run > its pwm control loop unless *something* poked it to reread the > temperature sensors. A different model of the same machine had a BMC > which would talk to the adt7470 over i2c and take care of that. That I understand, and while it is poor design in my opinion, it makes sense to some degree. > The other problem was that /some/ of the machines for whatever reason > would adjust the pwm value that you could read out over i2c, but > wouldn't actually change the fan speed unless you whacked the adt into > manual modem. Ah. That would be the reason for the extra code. Automatic fan speed control that needs to be refreshed manually. Oh my. > Neither of those two behaviors were listed in the datasheet, and we > (IBM) could never get an answer out of either Analog or our own hardware > group about whether or not this was the expected behavior. I > disassembled the BMC code to figure out what the other model computer > was doing, and (clumsily) wrote that into the driver. For all I know we > got a bad batch of adt7470s and all these weird gymnastics aren't > supposed to be necessary. > > The next generation switched to a totally different chip and supplier, > so I surmise they weren't happy with the results either. Those machines > tended to overheat if you were in Windows. > > > > 4* Why are you calling msleep_interruptible() in > > > adt7470_read_temperatures() to wait for the temperature conversions? We > > > return -EAGAIN if that happens, but then ignore that error code, and we > > > log a cryptic error message. Do I understand correctly that the only > > > case where this should happen is when the user unloads the kernel > > > driver, in which case we do not care about having been interrupted? I > > > can't actually get the error message to be logged when rmmod'ing the > > > module so I don't know what it would take to trigger it. > > Urrk, what a doof who wrote that. /me smacks 2009-era djwong. :P > > kthread_stop blocks until the thread exits... My experiments seem to confirm this. > but strangely we don't > even try to interrupt the msleep_interruptible call. How would we do that if we wanted to? Later you say this is not possible? > That's fine, > though device removal will take longer than it needs to. Yes, up to 2 seconds in my tests. Not pleasant, but also not necessarily something to worry about, as rmmod is usually not needed. > We also don't > care about the return value of msleep_interruptible at all since one > cannot interrupt the kthread. > > I probably picked interruptible sleep to avoid triggering the hangcheck > timer. I don't understand that part. Is a 2 second uninteruptible sleep in a kthread considered bad somehow? > > > 5* Is there any reason why the update thread is being started > > > unconditionally? As I understand it, it is only needed if at least one > > > PWM output is configured in automatic mode, which (I think) is not the > > > default. It is odd that the bug reporter hits a problem with the > > Yes, the driver should only start the kthread loop if someone wants > automatic temp control. OK, I'll give it a try. I don't want to add too much complexity though. Thanks, -- Jean Delvare SUSE L3 Support