Hi, On 5/6/20 10:11 AM, Naveen Krishna Ch wrote: > Hi Guenter > > On Wed, 6 May 2020 at 22:03, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: >> >> On Fri, May 01, 2020 at 11:20:02PM +0530, Naveen Krishna Chatradhi wrote: >>> Document amd_energy driver with all chips supported by it. >>> >>> Cc: Guenter Roeck <linux@xxxxxxxxxxxx> >>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@xxxxxxx> >>> --- >>> Changes in v5: None >>> >>> Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++ >>> Documentation/hwmon/index.rst | 1 + >>> 2 files changed, 101 insertions(+) >>> create mode 100644 Documentation/hwmon/amd_energy.rst >>> >>> diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst >>> new file mode 100644 >>> index 000000000000..2216c8b13e58 >>> --- /dev/null >>> +++ b/Documentation/hwmon/amd_energy.rst >>> @@ -0,0 +1,100 @@ >>> +.. SPDX-License-Identifier: GPL-2.0 >>> + >>> +Kernel driver amd_energy >>> +========================== >>> + >>> +Supported chips: >>> + >>> +* AMD Family 17h Processors >>> + >>> + Prefix: 'amd_energy' >>> + >>> + Addresses used: RAPL MSRs >>> + >>> + Datasheets: >>> + >>> + - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors >>> + >>> + https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip >>> + >>> + - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors >>> + >>> + https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip >>> + >>> +Author: Naveen Krishna Chatradhi <nchatrad@xxxxxxx> >>> + >>> +Description >>> +----------- >>> + >>> +The Energy driver exposes the energy counters that are >>> +reported via the Running Average Power Limit (RAPL) >>> +Model-specific Registers (MSRs) via the hardware monitor >>> +(HWMON) sysfs interface. >>> + >>> +1. Power, Energy and Time Units >>> + MSR_RAPL_POWER_UNIT/ C001_0299: >>> + shared with all cores in the socket >>> + >>> +2. Energy consumed by each Core >>> + MSR_CORE_ENERGY_STATUS/ C001_029A: >>> + 32-bitRO, Accumulator, core-level power reporting >>> + >>> +3. Energy consumed by Socket >>> + MSR_PACKAGE_ENERGY_STATUS/ C001_029B: >>> + 32-bitRO, Accumulator, socket-level power reporting, >>> + shared with all cores in socket >>> + >>> +These registers are updated every 1ms and cleared on >>> +reset of the system. >>> + >>> +Energy Caluclation >>> +------------------ >>> + >>> +Energy information (in Joules) is based on the multiplier, >>> +1/2^ESU; where ESU is an unsigned integer read from >>> +MSR_RAPL_POWER_UNIT register. Default value is 10000b, >>> +indicating energy status unit is 15.3 micro-Joules increment. >>> + >>> +Reported values are scaled as per the formula >>> + >>> +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules >>> + >>> +Users calculate power for a given domain by calculating >>> + dEnergy/dTime for that domain. >>> + >>> +Socket energy accumulation >>> +-------------------------- >>> + >>> +Current Socket energy status register is 32bit, assuming a 240W >>> +system, the register would wrap around in >>> + >>> + 2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins) >>> + >>> +To improve the wrap around time, a kernel thread is implemented >>> +to accumulate the socket energy counter to a 64-bit counter. The >>> +kernel thread starts running during probe, wakes up at 100secs >> >> wakes up every 100 seconds >> >>> +and stops running in remove. >> >> stops running when the driver is removed. > Will correct them >> >> All counters need to be be updated by the kernel thread, not just the socket >> counter. If the socket counter can wrap in 4.5 minutes, the matching per-core >> counters on a 64-core system can wrap every 4.5 * 64 = 288 minutes, which >> isn't much better. This might be even worse on a system with fewer cores and >> higher per-core power. > > Agreed, just need few clarifications though > 1. Is it OK to implement another thread for cores alone, as it need not run as > frequently as the socket thread. Your call, but personally I think it is not worth the overhead; see below. > 2. We have a scenario on servers, a thread accumulating energy for all 128 cores > might compromise the compute. So, i would like to provide a configuration > symbol or sysfs mechanism to enable/disable the core accumulation. > Another option would be to use a single thread but only update a single core per socket at a time. If the socket thread needs to run every N seconds, one would assume that the core thread only needs to run every N * (number of cores) seconds (assuming that it uses the same scale). If so, reading the data for one core (or maybe a couple of cores if the scale is different) plus the data for the socket should not be that expensive. If that is not acceptable, it might make more sense to blacklist the driver entirely in such situations; without accumulation the reported values are pretty much worthless. Thanks, Guenter >> >>> + >>> +A socket energy read would return the current register value >>> +added to the respective energy accumulator. >>> + >>> +Sysfs attributes >>> +---------------- >>> + >>> +=============== ======== ===================================== >>> +Attribute Label Description >>> +=============== ======== ===================================== >>> + >>> +* For index N between [1] and [nr_cpus] >>> + >>> +=============== ======== ====================================== >>> +energy[N]_input EcoreX Core Energy X = [0] to [nr_cpus - 1] >>> + Measured input core energy >>> +=============== ======== ====================================== >>> + >>> +* For N between [nr_cpus] and [nr_cpus + nr_socks] >>> + >>> +=============== ======== ====================================== >>> +energy[N]_input EsocketX Socket Energy X = [0] to [nr_socks -1] >>> + Measured input socket energy >>> +=============== ======== ====================================== >>> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst >>> index 8ef62fd39787..fc4b89810e67 100644 >>> --- a/Documentation/hwmon/index.rst >>> +++ b/Documentation/hwmon/index.rst >>> @@ -39,6 +39,7 @@ Hardware Monitoring Kernel Drivers >>> adt7470 >>> adt7475 >>> amc6821 >>> + amd_energy >>> asb100 >>> asc7621 >>> aspeed-pwm-tacho > > >