Re: [PATCH v4 2/9] Documentation: hwmon: Add OCC documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 07/25/2018 11:36 AM, Guenter Roeck wrote:
On Wed, Jul 11, 2018 at 04:01:31PM -0500, Eddie James wrote:
Document the hwmon interface for the OCC.

Signed-off-by: Eddie James <eajames@xxxxxxxxxxxxxxxxxx>
---
  Documentation/hwmon/occ | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 73 insertions(+)
  create mode 100644 Documentation/hwmon/occ

diff --git a/Documentation/hwmon/occ b/Documentation/hwmon/occ
new file mode 100644
index 0000000..465fa1a
--- /dev/null
+++ b/Documentation/hwmon/occ
@@ -0,0 +1,73 @@
+Kernel driver occ-hwmon
+=======================
+
+Supported chips:
+  * POWER8
+  * POWER9
+
+Author: Eddie James <eajames@xxxxxxxxxxxxxxxxxx>
+
+Description
+-----------
+
+This driver supports hardware monitoring for the On-Chip Controller (OCC)
+embedded on POWER processors. The OCC is a device that collects and aggregates
+sensor data from the processor and the system. The OCC can provide the raw
+sensor data as well as perform thermal and power management on the system.
+
+The P8 version of this driver is a client driver of I2C. It may be probed
+manually if an "ibm,p8-occ-hwmon" compatible device is found under the
+appropriate I2C bus node in the device-tree.
+
+The P9 version of this driver is a client driver of the FSI-based OCC driver.
+It will be probed automatically by the FSI-based OCC driver.
+
+Sysfs entries
+-------------
+
+The following attributes are supported. All attributes are read-only unless
+specified.
+
+temp[1-n]_label		OCC sensor id.
+temp[1-n]_input		Measured temperature in millidegrees C.
+[with temperature sensor version 2+]
+    temp[1-n]_fru_type		Given FRU (Field Replaceable Unit) type.
What is this ? An integer ? A string ?

+    temp[1-n]_fault		Temperature sensor fault.
+
+freq[1-n]_label		OCC sensor id.
+freq[1-n]_input		Measured frequency.
What does that have to do with hardware monitoring, and what exactly does it
measure ? AC voltage frequency ? Frequency of rainstorms in the surrounding
area ?

+
+power[1-n]_label	OCC sensor id.
+power[1-n]_input	Measured power in microwatts.
+power[1-n]_update_tag	Number of 250us samples represented in accumulator.
update_tag to represent number of samples ? Odd choice for
an attribute name. Why not "_samples" ? Also, if each sample
represents a specific amount of time, why not report a time ?

+power[1-n]_accumulator	Accumulation of 250us power readings.
There is no explanation of "accumulation". Is this the energy ?
If so, why not use energy attributes ? And what is the unit of
this measurement ?

+[with power sensor version 2+]
+    power[1-n]_function_id	Identifies what the power reading is for.
String ? Number ? Slot index ?  Bitmap ? And why isn't that reported
in the label ? After all, that is what the label is supposed to be
used for.

+    power[1-n]_apss_channel	Indicates APSS channel.
+
Does that provide any value to the user ?

+[power version 0xa0 only]
+power1_id			OCC sensor id.
This is inconsistent with the other attributes and even with itself.

+power[1-n]_label		Sensor type, "system", "proc", "vdd", or "vdn".
+power[1-n]_input		Most recent power reading in microwatts.
Overall I am left with no idea what
	_id
	_label
	_function_id
	_apps_channel
are and how they relate to each other, except that it all looks quite
inconsistent. You might want to consider merging all those attributes into
the label in some consistent way.

+power[1-n]_update_tag		Number of samples in the accumulator.
+power[1-n]_accumulator		Accumulation of power readings.
Same as above.

+[with sensor type "system" and "proc" only]
+    power[1-n]_update_time	Time in us that the power value is read.
+
+caps1_current		Current OCC power cap in watts.
+caps1_reading		Current system output power in watts.
+caps1_norm		Power cap without redundant power.
+caps1_max		Maximum power cap.
Why do those have to be non-standard attributes ? Please explain why you can not
use power[1-n]_cap attributes.

+[caps version 1 and 2 only]
+    caps1_min		Minimum power cap.
+[caps version 3+]
+    caps1_min_hard		Hard minimum cap that can be set and held.
+    caps1_min_soft		Soft minimum cap below hard, not guaranteed.
+caps1_user		The powercap specified by the user. Will be 0 if no
+			user powercap exists. This attribute is read-write.
+[caps version 1+]
+    caps1_user_source	Indicates how the user power limit was set.
+
+extn[1-n]_label		ASCII id or sensor id.
+extn[1-n]_flags		Indicates type of label attribute.
+extn[1-n]_input		Data.
Great non-explanation.

Not reviewing the series further. I am sure I asked that each non-standard
attribute is explained. There is neither an explanation why the attributes
are needed nor, in many cases, why non-standard attributes were chosen
instead of standard ones. On top of that, the non-standard attributes are
not even documented properly, leaving the reader wondering not only why
they are needed, but what they are used for in the first place.

Hi,

Thanks for the feedback Guenter. I am about to put up a new patch set with fixes for many of the issues you indicated and better descriptions. Now all the attributes except two should conform to standard hwmon attributes. The exceptions are for the user power cap and user power cap source. These are needed in order to make decisions about power management while controlling the system. Please look at their documentation to see if they're acceptable.

Let me know what you think!
Thanks,
Eddie


Guenter





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux