Re: [PATCH 2/3] hwmon: Add support for ltc2947

Guenter Roeck <linux@xxxxxxxxxxxx> · Mon, 7 Oct 2019 05:44:53 -0700

On 10/7/19 5:25 AM, Sa, Nuno wrote:
On Fri, 2019-10-04 at 08:06 -0700, Guenter Roeck wrote:

On Fri, Oct 04, 2019 at 07:45:07AM +0000, Sa, Nuno wrote:
[ ... ]
+static int ltc2947_val_read(struct ltc2947_data *st, const u8
reg,
+			    const u8 page, const size_t size,
s64 *val)
+{
+	int ret;
+	u64 __val = 0;
+
+	mutex_lock(&st->lock);
+

On a side note, suspend code is supposed to be atomic,
If this lock is held, the process or thread holding it
will likely stall suspend since it won't run.
At the very least, this would have to be a trylock,
with suspend failing if the lock can not be acquired.

Got it. Even more, Now I don't see the point of having the mutex in the
PM callbacks (though I saw other drivers doing this). As you said, at
the very least the trylock should be used since a frozen task might
have the mutex locked. Either way, I will drop both the flag and the
call to mutex_* functions is suspend()/resume().

+	if (st->reset) {
+		mutex_unlock(&st->lock);
+		return -EPERM;

Not sure what the error here should be, but EPERM is not correct.

Under which conditions would this function be called while in
suspend ?

Honestly, this is more like a sanity check. I'm not sure if we can
get
here in suspend mode. Don't userland apps can still run in suspend?
I
guess so but I'm not 100% sure on this. Do you have any
recommendation
for the error here?

Sorry, I won't accept "just in case" code.

Having said that, please inform yourself how suspend works. Userland
code
has to be stopped before any hardware can be disabled. Similar,
hardware
has to be re-enabled before userland code can resume.
Otherwise the kernel would crash all over the place. In many cases,
disabling the hardware means that trying to access hardware registers
will cause an acess fault.

Yes, you are right. I did checked the suspend code and all userland
tasks and kthreads are stopped before the suspend() callback is called
for the HW devices.

[...]

+
+static struct attribute *ltc2947_attrs[] = {
+	&sensor_dev_attr_in0_fault.dev_attr.attr,
+	&sensor_dev_attr_in1_fault.dev_attr.attr,
+	&sensor_dev_attr_curr1_fault.dev_attr.attr,
+	&sensor_dev_attr_temp1_fault.dev_attr.attr,
+	&sensor_dev_attr_power1_input.dev_attr.attr,
+	&sensor_dev_attr_power1_max.dev_attr.attr,
+	&sensor_dev_attr_power1_min.dev_attr.attr,
+	&sensor_dev_attr_power1_input_highest.dev_attr.attr,
+	&sensor_dev_attr_power1_input_lowest.dev_attr.attr,
+	&sensor_dev_attr_power1_fault.dev_attr.attr,
+	&sensor_dev_attr_energy1_input.dev_attr.attr,
+	&sensor_dev_attr_energy1_max.dev_attr.attr,
+	&sensor_dev_attr_energy1_min.dev_attr.attr,
+	&sensor_dev_attr_energy1_max_alarm.dev_attr.attr,
+	&sensor_dev_attr_energy1_min_alarm.dev_attr.attr,
+	&sensor_dev_attr_energy2_input.dev_attr.attr,
+	&sensor_dev_attr_energy2_max.dev_attr.attr,
+	&sensor_dev_attr_energy2_min.dev_attr.attr,
+	&sensor_dev_attr_energy2_max_alarm.dev_attr.attr,
+	&sensor_dev_attr_energy2_min_alarm.dev_attr.attr,
+	&sensor_dev_attr_energy1_overflow_alarm.dev_attr.attr,
+	&sensor_dev_attr_energy2_overflow_alarm.dev_attr.attr,

These overflow attributes are kind of an alarm for the energy ones.
It
tells that the energy registers are about to overflow. I guess that
some application can easily find out the maximum values supported
on
these registers and implement whatever logic they want in the app
itself. So, if you prefer I can just drop this ones?

I understand the overflow use case. However, the mere presence
of min/max attributes for energy attributes suggests that this
is not the min/max use case for hwmon attributes. There is no
"minimum"
or "maximum" energy for an accumulating value. Such attributes
only make sense in an application abler to measure available
energy (eg a battery controller). I'll have to read the chip
specification to understand the intended use case.

Should I just drop the overflow attributes? I think the part can be
used to check battery charging efficiency for example (though I guess
we would need to also support the Charge register's).

If chip is (or can be) used as charger, it should register as such.
Note my question was the energy limit attributes, not the overflow
attributes.

+	&sensor_dev_attr_energy1_fault.dev_attr.attr,
+	&sensor_dev_attr_energy2_fault.dev_attr.attr,

Some of those are non-standard attributes. You'll have
to explain each in detail, especially why it makes sense
to provide such attributes to the user and why you can't
use standard attributes instead. Also, for the _fault
attributes, I don't entirely see the point. If the fault bit
is set, ADC readings are not valid because supply voltage
is low. This means that ADC register reads will be invalid.
What is the point of having a non-standard attribute - which
likely will be ignored - instead of returning an error when
an attempt is made to read an ADC value ?

I was also not sure on this *_fault attributes. They are there to
tell
that the readings are invalid. Now that I think about it, I'm not
sure
if it even makes sense to return error if this bit is set. The part
is
in continuous mode so, it might happen that we have the fault bit
set
for a short time but afterwards things go normal and the bit will
still
be set until we read it. So my point is, we might be returning
error
for a conversion that happened way before our current reading. Any
suggestion here? Would you be fine if I just drop this attributes?

It sounds like "fault" means something like "one of the past readings
was wrong, but I don't know which one and I don't know if the wrong
value was ever read by user space". Sorry, I fail to see what value
those attributes are supposed to have for the user. At best it could
mean "please re-read the associated attrribute", but that could as
well
be accomplished without userspace action if it is really needed.
Also, per datasheet, it looks like the fault bit is set of the chip
voltage is too low. If that happens, the system has a severe problem
which can not be resolved with an attribute visible to userspace.

I will drop the fault attributes.

Others, like energy1_input, or most of the power attributes,
are standard attributes. Please explain the reasoning for
not using the standard API for those.

This ones were because we need 64bit variables. For energy, the
part
also supports the alarms, max and min attributes so I included
them.

I can to some degree the logic for energy attributes, but do you
really
have an application where the chip is used on a 32-bit system and
monitors power larger than 2kW ?

Hmm, I looked again at the chip specification and unless I'm missing
something obvious the part can only measure +/- 30A and 0-15V giving us
+/- 450W which definitely fits in a long variable. The only thing that
will be truncated is the min/max values. The part, by default, has this
value to 0x7fff and 0x8000 which times 200000uW (part scale) will be
truncated. Now, we can argue that this max/min values are not real and
the user is expected to write this attributes with some meaningful
values? How do you suggest to proceed? Should I just use standard
attributes for power?

How about detecting the overflow on read and just report the maximum
supported value ? Or, alternatively, initialize the register with the
maximum supported value.

Guenter