Re: [PATCH 2/3] hwmon: Add support for ltc2947

"Sa, Nuno" <Nuno.Sa@xxxxxxxxxx> · Mon, 7 Oct 2019 12:25:49 +0000

On Fri, 2019-10-04 at 08:06 -0700, Guenter Roeck wrote:
> 
> On Fri, Oct 04, 2019 at 07:45:07AM +0000, Sa, Nuno wrote:
> [ ... ]
> > > > +static int ltc2947_val_read(struct ltc2947_data *st, const u8
> > > > reg,
> > > > +			    const u8 page, const size_t size,
> > > > s64 *val)
> > > > +{
> > > > +	int ret;
> > > > +	u64 __val = 0;
> > > > +
> > > > +	mutex_lock(&st->lock);
> > > > +
> 
> On a side note, suspend code is supposed to be atomic,
> If this lock is held, the process or thread holding it
> will likely stall suspend since it won't run.
> At the very least, this would have to be a trylock,
> with suspend failing if the lock can not be acquired.

Got it. Even more, Now I don't see the point of having the mutex in the
PM callbacks (though I saw other drivers doing this). As you said, at
the very least the trylock should be used since a frozen task might
have the mutex locked. Either way, I will drop both the flag and the
call to mutex_* functions is suspend()/resume().

> > > > +	if (st->reset) {
> > > > +		mutex_unlock(&st->lock);
> > > > +		return -EPERM;
> > > 
> > > Not sure what the error here should be, but EPERM is not correct.
> > > 
> > > Under which conditions would this function be called while in
> > > suspend ?
> > 
> > Honestly, this is more like a sanity check. I'm not sure if we can
> > get
> > here in suspend mode. Don't userland apps can still run in suspend?
> > I
> > guess so but I'm not 100% sure on this. Do you have any
> > recommendation
> > for the error here?
> > 
> Sorry, I won't accept "just in case" code.
> 
> Having said that, please inform yourself how suspend works. Userland
> code
> has to be stopped before any hardware can be disabled. Similar,
> hardware
> has to be re-enabled before userland code can resume.
> Otherwise the kernel would crash all over the place. In many cases,
> disabling the hardware means that trying to access hardware registers
> will cause an acess fault.

Yes, you are right. I did checked the suspend code and all userland
tasks and kthreads are stopped before the suspend() callback is called
for the HW devices.

> [...]
> 
> > > > +
> > > > +static struct attribute *ltc2947_attrs[] = {
> > > > +	&sensor_dev_attr_in0_fault.dev_attr.attr,
> > > > +	&sensor_dev_attr_in1_fault.dev_attr.attr,
> > > > +	&sensor_dev_attr_curr1_fault.dev_attr.attr,
> > > > +	&sensor_dev_attr_temp1_fault.dev_attr.attr,
> > > > +	&sensor_dev_attr_power1_input.dev_attr.attr,
> > > > +	&sensor_dev_attr_power1_max.dev_attr.attr,
> > > > +	&sensor_dev_attr_power1_min.dev_attr.attr,
> > > > +	&sensor_dev_attr_power1_input_highest.dev_attr.attr,
> > > > +	&sensor_dev_attr_power1_input_lowest.dev_attr.attr,
> > > > +	&sensor_dev_attr_power1_fault.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy1_input.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy1_max.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy1_min.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy1_max_alarm.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy1_min_alarm.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_input.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_max.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_min.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_max_alarm.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_min_alarm.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy1_overflow_alarm.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_overflow_alarm.dev_attr.attr,
> > 
> > These overflow attributes are kind of an alarm for the energy ones.
> > It
> > tells that the energy registers are about to overflow. I guess that
> > some application can easily find out the maximum values supported
> > on
> > these registers and implement whatever logic they want in the app
> > itself. So, if you prefer I can just drop this ones?
> > 
> I understand the overflow use case. However, the mere presence
> of min/max attributes for energy attributes suggests that this
> is not the min/max use case for hwmon attributes. There is no
> "minimum"
> or "maximum" energy for an accumulating value. Such attributes
> only make sense in an application abler to measure available 
> energy (eg a battery controller). I'll have to read the chip
> specification to understand the intended use case.

Should I just drop the overflow attributes? I think the part can be
used to check battery charging efficiency for example (though I guess
we would need to also support the Charge register's).

> > > > +	&sensor_dev_attr_energy1_fault.dev_attr.attr,
> > > > +	&sensor_dev_attr_energy2_fault.dev_attr.attr,
> > > 
> > > Some of those are non-standard attributes. You'll have
> > > to explain each in detail, especially why it makes sense
> > > to provide such attributes to the user and why you can't
> > > use standard attributes instead. Also, for the _fault
> > > attributes, I don't entirely see the point. If the fault bit
> > > is set, ADC readings are not valid because supply voltage
> > > is low. This means that ADC register reads will be invalid.
> > > What is the point of having a non-standard attribute - which
> > > likely will be ignored - instead of returning an error when
> > > an attempt is made to read an ADC value ?
> > 
> > I was also not sure on this *_fault attributes. They are there to
> > tell
> > that the readings are invalid. Now that I think about it, I'm not
> > sure
> > if it even makes sense to return error if this bit is set. The part
> > is
> > in continuous mode so, it might happen that we have the fault bit
> > set
> > for a short time but afterwards things go normal and the bit will
> > still
> > be set until we read it. So my point is, we might be returning
> > error
> > for a conversion that happened way before our current reading. Any
> > suggestion here? Would you be fine if I just drop this attributes?
> >   
> 
> It sounds like "fault" means something like "one of the past readings
> was wrong, but I don't know which one and I don't know if the wrong
> value was ever read by user space". Sorry, I fail to see what value
> those attributes are supposed to have for the user. At best it could
> mean "please re-read the associated attrribute", but that could as
> well
> be accomplished without userspace action if it is really needed.
> Also, per datasheet, it looks like the fault bit is set of the chip
> voltage is too low. If that happens, the system has a severe problem
> which can not be resolved with an attribute visible to userspace.

I will drop the fault attributes.

> > > Others, like energy1_input, or most of the power attributes,
> > > are standard attributes. Please explain the reasoning for
> > > not using the standard API for those. 
> > 
> > This ones were because we need 64bit variables. For energy, the
> > part
> > also supports the alarms, max and min attributes so I included
> > them.
> >  
> I can to some degree the logic for energy attributes, but do you
> really
> have an application where the chip is used on a 32-bit system and
> monitors power larger than 2kW ?

Hmm, I looked again at the chip specification and unless I'm missing
something obvious the part can only measure +/- 30A and 0-15V giving us
+/- 450W which definitely fits in a long variable. The only thing that
will be truncated is the min/max values. The part, by default, has this
value to 0x7fff and 0x8000 which times 200000uW (part scale) will be
truncated. Now, we can argue that this max/min values are not real and
the user is expected to write this attributes with some meaningful
values? How do you suggest to proceed? Should I just use standard
attributes for power?

> If you do have such a use case, we'll need to enhance the API
> to reflect it.
> 
> Thanks,
> Guenter