RE: [RFC PATCH 3/6] dt-bindings: axi-fan-control: add tacho properties

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Guenter Roeck <groeck7@xxxxxxxxx> On Behalf Of Guenter
> Roeck
> Sent: Wednesday, July 21, 2021 5:00 PM
> To: Sa, Nuno <Nuno.Sa@xxxxxxxxxx>
> Cc: Rob Herring <robh@xxxxxxxxxx>; linux-hwmon@xxxxxxxxxxxxxxx;
> devicetree@xxxxxxxxxxxxxxx; Jean Delvare <jdelvare@xxxxxxxx>
> Subject: Re: [RFC PATCH 3/6] dt-bindings: axi-fan-control: add tacho
> properties
> 
> 
> On Mon, Jul 19, 2021 at 07:46:41AM +0000, Sa, Nuno wrote:
> >
> >
> > > -----Original Message-----
> > > From: Guenter Roeck <groeck7@xxxxxxxxx> On Behalf Of Guenter
> > > Roeck
> > > Sent: Friday, July 16, 2021 5:04 PM
> > > To: Sa, Nuno <Nuno.Sa@xxxxxxxxxx>
> > > Cc: Rob Herring <robh@xxxxxxxxxx>; linux-
> hwmon@xxxxxxxxxxxxxxx;
> > > devicetree@xxxxxxxxxxxxxxx; Jean Delvare <jdelvare@xxxxxxxx>
> > > Subject: Re: [RFC PATCH 3/6] dt-bindings: axi-fan-control: add
> tacho
> > > properties
> > >
> > > [External]
> > >
> > > On 7/16/21 12:44 AM, Sa, Nuno wrote:
> > > [ ... ]
> > > >>
> > > >> Are you sure you can ever get this stable ? Each fan has its own
> > > >> properties
> > > >> and tolerances. If you replace a fan in a given system, you might
> get
> > > >> different RPM numbers. The RPM will differ widely from system
> to
> > > >> system
> > > >> and from fan to fan. Anything that assumes a specific RPM in
> > > >> devicetree
> > > >> data seems to be quite vulnerable to failures. I have
> experienced
> > > that
> > > >> recently with a different chip which also tries to correlate RPM
> and
> > > >> PWM
> > > >> and fails quite miserably.
> > > >>
> > > >> In my experience, anything other than minimum fan speed is
> really
> > > a
> > > >> recipe
> > > >> for instability and sporadic false failures. Even setting a
> minimum
> > > fan
> > > >> speed
> > > >> is tricky because it depends a lot on the fan.
> > > >
> > > > I see what you mean. So, I had to go through this process when
> > > testing
> > > > this changes because the fan I'm using is different from the
> default
> > > one
> > > > used to develop and stablish the default values in the IP core.
> The
> > > core
> > >
> > > Exactly my point.
> > >
> > > > provides you with a register which contains the tacho
> measurements
> > > in
> > > > clock cycles. You can read that for all the PWM points of interest
> > > > (with devmem2 for example) and make your own "calibration". I
> > > assume
> > > > that people have to go through this process before putting some
> > > values
> > > > in the devicetree. I'm aware this is not the neatest process but I
> > > guess it's
> > > > acceptable...
> > > >
> > >
> > > Do you really expect everyone using a system with this chip to go
> > > through
> > > this process and update its devicetree configuration, and then
> repeat it
> > > whenever a fan is changed ? Given how dynamic this is, I really
> wonder
> > > if that information should be in devicetree in the first place.
> > >
> >
> > My naive assumption was that we would only do this work at
> evaluation
> > time. After that and after we settled with a fan for some system, I
> expected
> > that changing to a different fan is not that likely. From your inputs, I
> guess
> > this is not really the case which makes this process more
> cumbersome (as it
> > also implies recompiling the devicetree for your system).
> >
> > However, even if we export these as runtime parameters,
> services/daemons
> > will also have an hard time doing this "calibration" in a dynamic way.
> The reason
> > is because the way the controller works is that it only accepts a new
> PWM
> > request if it is an higher value than whatever the HW has at that
> moment. Thus,
> > going through the calibration points might be very cumbersome. I
> can see some
> > ways of handling this though but not very neat...
> >
> > Since this is a FPGA core, we might have some flexibility here.
> Something that
> > came to my mind would be to have a calibration mode in the HW that
> would
> > allow us to freely control the PWM values. In that way we could go
> freely over
> > the calibration points. I guess, for safety reasons, this calibration
> mode would
> > expire after some reasonable time (that give us enough time for
> doing the whole
> > thing). The best place for doing the calibration, I guess it would be
> directly in the
> > driver since we do receive the interrupts about new tacho
> measurements making
> > things easier to sync and handle. However, given the time that takes
> for a new
> > PWM to settle + new tacho measurements, it would not be very
> acceptable to do this
> > during probe which is definitely also not ideal (we could defer this to
> a worker/timer).
> >
> > I'm not sure if the above makes much sense to you and it also
> depends on the HW
> > guys being on board with this mechanism.
> >
> 
> I don't really know what to say or recommend here. Personally I think
> any
> attempt to tie PWM values to RPM are doomed to fail. Here are a
> couple of
> examples:
> 
> Take your test system and move the fan to a restricted place (eg close
> to a
> wall). You'll see the fan RPM change, potentially significantly. Put it into
> some place with airflow towards or away from the system (eg blow air
> into
> the system from another place, which may happen if the system is
> installed
> in a lab), and again you'll see fan speed changes. Open the chassis, and
> the fan speed will change. I have seen fan speeds vary by up to 50%
> when
> changing airflow.

Here we can at least control the tolerance for each PWM vs RPM point but
I can image this as a very painful process to get these values right and no
one will think in setting tolerances of 50%...

> That doesn't even take into account that replacing a fan even with a
> similar
> model (eg after a fan failed) will likely result in potentially significant
> rpm changes.
> 
> Ultimately, anything that does more than determine if a fan is still
> running
> is potentially unstable.

Yeah, I understand your points. The HW does the evaluation and of 
course it also looks for the presence of a signal... So, in your opinion,
not even setting a minimum fan speed is likely to be stable?

> Having said all that, it is really your call to decide how you want to
> detect
> fan failures.
> 

Well, my hands are also tied here. The core is supposed to work without
any SW interaction in which case the tacho evaluation is always done. The
only thing I could do is to completely ignore fan faults which is also bad... 

I can try to persuade the HW guy to completely remove the evaluation and
just give fan fauts in case there's no signal but I'm not really sure he will go
for it. In that case, I'm tempted to just leave this as-is (with the extra bindings
for the tolerance and turn these bindings into a map) if you're willing to take it...
The reason is that, as you said, this is likely to be unstable any ways so that the
added complexity in the SW does not really pay off (better keep at least the SW
simple)...

- Nuno Sá




[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux