Re: [PATCH v3 2/2] hwmon: ltc4282: add support for the LTC4282 chip

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-12-12 at 09:59 -0800, Guenter Roeck wrote:
> On 12/12/23 07:30, Nuno Sá wrote:
> > On Tue, 2023-12-12 at 07:14 -0800, Guenter Roeck wrote:
> > > On 12/12/23 06:28, Nuno Sá wrote:
> > > > On Mon, 2023-12-11 at 07:36 -0800, Guenter Roeck wrote:
> > > > > On Tue, Dec 05, 2023 at 04:22:56PM +0100, Nuno Sa via B4 Relay wrote:
> > > > > > From: Nuno Sa <nuno.sa@xxxxxxxxxx>
> > > > > > 
> > > > > > The LTC4282 hot swap controller allows a board to be safely inserted and
> > > > > > removed from a live backplane. Using one or more external N-channel pass
> > > > > > transistors, board supply voltage and inrush current are ramped up at an
> > > > > > adjustable rate. An I2C interface and onboard ADC allows for monitoring
> > > > > > of board current, voltage, power, energy and fault status.
> > > > > > 
> > > > > > Signed-off-by: Nuno Sa <nuno.sa@xxxxxxxxxx>
> > > > > > ---
> > > > > > +
> > > > > 
> > > > > > +power1_good            Power considered good
> > > > > 
> > > > > I really don't like this attribute. Like the ones below it is non-standard
> > > > > and invisible for standard applications. On top of that, I think it isn't
> > > > > really related to "power" but to the output voltage. What does it actually
> > > > > report that isn't included in the FET faults ?
> > > > > 
> > > > 
> > > > This is detected with the FB pin and a voltage divider (from the output
> > > > voltage). Basically depending on the level of that pin, the chip indicate
> > > > power
> > > > good or power bad. I was also very reluctant with this attribute (I mention
> > > > it
> > > > in the v1 cover). This might not even indicate any misbehave. We also
> > > > support
> > > > reporting this using the gpio1 pin (if we set it that way). So, I guess I
> > > > can
> > > > just drop this one and add support for it if we ever have a real usecase
> > > > where I
> > > > can actually justify having it :).
> > > > 
> > > > We already have the power_bad fault log in debugfs so I'm not sure if adding
> > > > this one there adds much value.
> > > > 
> > > > > 
> > > > > > +fet_short_fault                FET short alarm
> > > > > > +fet_bad_fault          FET bad alarm
> > > > > 
> > > > > Those attributes have little value since they are not standard attributes
> > > > > and won't be seen by standard applications. On top of that, it is not
> > > > > clear
> > > > > (not documented) what the attribute actually reports. I assume it is
> > > > > associated with the output voltage, i.e., in0, but that is just an
> > > > > assumption.
> > > > > 
> > > > 
> > > > fet_short - This is one is detected if the ADC measures a current sense
> > > > voltage
> > > > > 0.25mv while the fet gate is off.
> > > > 
> > > > fet_bad - Is set by monitoring the voltage at the gate and the drain to
> > > > source
> > > > voltage.
> > > > 
> > > > These ones might indicate real issues with the HW so I thought they could be
> > > > important...
> > > >    
> > > > > What do you think about introducing a standard inX_fault attribute ?
> > > > > It would not be as specific as short/bad, but I think it would be more
> > > > > useful and we could add it to the ABI.
> > > > > 
> > > > 
> > > > It would be better than nothing. And we do have fault logs for both these
> > > > failures so userspace could also use that to know exactly what was the
> > > > issue. If
> > > > that's ok with you, I would then report this in inX_fault? Did you had in
> > > > mind
> > > > putting this in in0 (vsource) or adding a new channel?
> > > > 
> > > > In my first draft I had another voltage channel (label: VFET) to report the
> > > > fet_bad condition. I was using the inX_crit or inX_lcrit but it felt bad so
> > > > I
> > > > removed it...
> > > > 
> > > 
> > > This isn't an extra voltage channel. It should be associated with the output
> > > voltage because that is what is affected, and that would be in0.
> > > 
> > 
> > Noted... will go with that and add an introductory patch for having inX_fault in
> > hwmon.
> > 
> > > > > > +fault_logs_reset       Clears all the Logged Faults
> > > > 
> > > > > What exactly does that do that is user visible ?
> > > > 
> > > > Well, this one is because in some configurations the chip won't enable the
> > > > output load until you reset/clear the fault log keeping it from enabling the
> > > > output. This is the comment I have in the code:
> > > > 
> > > > "Fault log failures. These faults might be important in systems where auto-
> > > > retry
> > > > is not enabled since they will cause the part to latch off until they are
> > > > cleared. Typically that happens when the system admin is close enough so he
> > > > can
> > > > check what happened and manually clear the faults. Moreover, manually
> > > > clearing
> > > > the faults might only matter when ON_FAULT_MASK in the CONTROL register is
> > > > set
> > > > (which is the default) as in that case, a turn off signal from the ON pin
> > > > won't
> > > > clear them."
> > > > 
> > > > In v1 I was allowing to clear fauls log individually and you recommended to
> > > > have
> > > > an attribute to clear them all at once as that would simplify things.
> > > > 
> > > > I just kept it in here because this might be important for the chip to work
> > > > as
> > > > expected again so having it in debugfs might be weird.
> > > > 
> > > 
> > > How about using a write to in0_fault to clear the log ?
> > > 
> > 
> > But that would be just related with the output voltage. You can also have
> > failures on VDD (over and undervolatge fault - I'm using in1_crit and in1_lcrit)
> > and if you have the auto retry bit disabled, then clearing fault logs might be
> > important. This attribute is nice because makes it clear what will be done.
> > OTOH, I do understand your worries with non standard ABI...
> > 
> 
> Depends on the meaning of "fault". PMBus, for example, uses the "fault" terminology
> to indicate that critical limits have been exceeded. That isn't a "fault" in the
> terminology used by the hwmon subsystem. Violating a voltage/current/power range
> isn't a fault. A shorted FET or temperature sensor, however, is a fault.
> 
> In that sense, if the output is turned off because a voltage / current / power
> limit was violated, a write into the respective alarm register would be appropriate
> to reset it if the condition doesn't auto-reset. It really all depends on the
> context.
> 

Yeah, maybe I'm also failing in giving you all the "context". The device has kind of
three registers for failures/status/logs:

1. The ADC_ALERT_LOG. This is the typical one for violated min/max limits. They don't
auto-reset so I'm already doing that in the driver (after reading the alarm bit).

2. The STATUS register. In this one, we have the fet bad/short failures and critical
limits for VDD (under/over voltage detection). We also have overcurrent detection and
the power1_good stuff we already talked about. This is one auto clears and it's RO.
So we kind of have a mix of faults and critical limits in here.

3. The FAULT_LOG register. This one is more about logging and is very similar (if not
identical) in the reported bits with the STATUS register. The differences is that
this one does not auto-clears as it's about logging so we want to know about historic
failures and in some configurations (as mentioned before) we do need to clear the
fault log bit so the chip will be able to enable again the output voltage. And
regarding that we have these main bits:

 * Under voltage on VDD
 * Over voltage on VDD
 * Over current on the sense resistor
 * FET bad
 * FET short 
 * Power bad

I can say that the under/over voltage and current will prevent the output to be
enabled. I could force that state in my test environment and could see that
happening. For the FET faults I'm not sure as I cannot force those faults. The power
stuff won't affect the output.

> > I'm not sure how usual is for fault logs to be around that would justify for a
> > global standard attribute.
> > 
> 
> We could also use in[01]_reset_history. While that is originally intended to reset
> min/max voltage history, I think it makes sense to extend the meaning to include
> fault history (even more so if that history includes over/undervoltage events).
> Plus, there are other _reset_history attributes which could be used to reset
> power/current/temperature history separately if that is supported by the chip.
> 

Well, I'm already supporting _reset_history in the voltage/power/current channels so
I can easily extend that for clearing fault history if that is fine with you. I just
need to document it because it's a bit of an "hidden" thing. The question would also
be, should I just document this for this chip docs or in the general hwmon docs?

- Nuno Sá





[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux