Hi Guenter On Thu, 7 May 2020 at 00:57, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > Hi, > > On 5/6/20 10:11 AM, Naveen Krishna Ch wrote: > > Hi Guenter > > > > On Wed, 6 May 2020 at 22:03, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > >> > >> On Fri, May 01, 2020 at 11:20:02PM +0530, Naveen Krishna Chatradhi wrote: > >>> Document amd_energy driver with all chips supported by it. > >>> > >>> Cc: Guenter Roeck <linux@xxxxxxxxxxxx> > >>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@xxxxxxx> > >>> --- > >>> Changes in v5: None > >>> > >>> Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++ > >>> Documentation/hwmon/index.rst | 1 + > >>> 2 files changed, 101 insertions(+) > >>> create mode 100644 Documentation/hwmon/amd_energy.rst > >>> > >>> diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst > >>> new file mode 100644 > >>> index 000000000000..2216c8b13e58 > >>> --- /dev/null > >>> +++ b/Documentation/hwmon/amd_energy.rst > >>> @@ -0,0 +1,100 @@ > >>> +.. SPDX-License-Identifier: GPL-2.0 > >>> + > >>> +Kernel driver amd_energy > >>> +========================== > >>> + > >>> +Supported chips: > >>> + > >>> +* AMD Family 17h Processors > >>> + > >>> + Prefix: 'amd_energy' > >>> + > >>> + Addresses used: RAPL MSRs > >>> + > >>> + Datasheets: > >>> + > >>> + - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors > >>> + > >>> + https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip > >>> + > >>> + - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors > >>> + > >>> + https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip > >>> + > >>> +Author: Naveen Krishna Chatradhi <nchatrad@xxxxxxx> > >>> + > >>> +Description > >>> +----------- > >>> + > >>> +The Energy driver exposes the energy counters that are > >>> +reported via the Running Average Power Limit (RAPL) > >>> +Model-specific Registers (MSRs) via the hardware monitor > >>> +(HWMON) sysfs interface. > >>> + > >>> +1. Power, Energy and Time Units > >>> + MSR_RAPL_POWER_UNIT/ C001_0299: > >>> + shared with all cores in the socket > >>> + > >>> +2. Energy consumed by each Core > >>> + MSR_CORE_ENERGY_STATUS/ C001_029A: > >>> + 32-bitRO, Accumulator, core-level power reporting > >>> + > >>> +3. Energy consumed by Socket > >>> + MSR_PACKAGE_ENERGY_STATUS/ C001_029B: > >>> + 32-bitRO, Accumulator, socket-level power reporting, > >>> + shared with all cores in socket > >>> + > >>> +These registers are updated every 1ms and cleared on > >>> +reset of the system. > >>> + > >>> +Energy Caluclation > >>> +------------------ > >>> + > >>> +Energy information (in Joules) is based on the multiplier, > >>> +1/2^ESU; where ESU is an unsigned integer read from > >>> +MSR_RAPL_POWER_UNIT register. Default value is 10000b, > >>> +indicating energy status unit is 15.3 micro-Joules increment. > >>> + > >>> +Reported values are scaled as per the formula > >>> + > >>> +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules > >>> + > >>> +Users calculate power for a given domain by calculating > >>> + dEnergy/dTime for that domain. > >>> + > >>> +Socket energy accumulation > >>> +-------------------------- > >>> + > >>> +Current Socket energy status register is 32bit, assuming a 240W > >>> +system, the register would wrap around in > >>> + > >>> + 2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins) > >>> + > >>> +To improve the wrap around time, a kernel thread is implemented > >>> +to accumulate the socket energy counter to a 64-bit counter. The > >>> +kernel thread starts running during probe, wakes up at 100secs > >> > >> wakes up every 100 seconds > >> > >>> +and stops running in remove. > >> > >> stops running when the driver is removed. > > Will correct them > >> > >> All counters need to be be updated by the kernel thread, not just the socket > >> counter. If the socket counter can wrap in 4.5 minutes, the matching per-core > >> counters on a 64-core system can wrap every 4.5 * 64 = 288 minutes, which > >> isn't much better. This might be even worse on a system with fewer cores and > >> higher per-core power. > > > > Agreed, just need few clarifications though > > 1. Is it OK to implement another thread for cores alone, as it need not run as > > frequently as the socket thread. > > Your call, but personally I think it is not worth the overhead; see below. > > > 2. We have a scenario on servers, a thread accumulating energy for all 128 cores > > might compromise the compute. So, i would like to provide a configuration > > symbol or sysfs mechanism to enable/disable the core accumulation. > > > > Another option would be to use a single thread but only update a single core > per socket at a time. If the socket thread needs to run every N seconds, > one would assume that the core thread only needs to run every N * (number > of cores) seconds (assuming that it uses the same scale). If so, reading > the data for one core (or maybe a couple of cores if the scale is different) > plus the data for the socket should not be that expensive. This is good and possible. Thanks > > If that is not acceptable, it might make more sense to blacklist the driver > entirely in such situations; without accumulation the reported values are > pretty much worthless. Sure, will implement core accumulation as well. > > Thanks, > Guenter > > >> > >>> + > >>> +A socket energy read would return the current register value > >>> +added to the respective energy accumulator. > >>> + > >>> +Sysfs attributes > >>> +---------------- > >>> + > >>> +=============== ======== ===================================== > >>> +Attribute Label Description > >>> +=============== ======== ===================================== > >>> + > >>> +* For index N between [1] and [nr_cpus] > >>> + > >>> +=============== ======== ====================================== > >>> +energy[N]_input EcoreX Core Energy X = [0] to [nr_cpus - 1] > >>> + Measured input core energy > >>> +=============== ======== ====================================== > >>> + > >>> +* For N between [nr_cpus] and [nr_cpus + nr_socks] > >>> + > >>> +=============== ======== ====================================== > >>> +energy[N]_input EsocketX Socket Energy X = [0] to [nr_socks -1] > >>> + Measured input socket energy > >>> +=============== ======== ====================================== > >>> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst > >>> index 8ef62fd39787..fc4b89810e67 100644 > >>> --- a/Documentation/hwmon/index.rst > >>> +++ b/Documentation/hwmon/index.rst > >>> @@ -39,6 +39,7 @@ Hardware Monitoring Kernel Drivers > >>> adt7470 > >>> adt7475 > >>> amc6821 > >>> + amd_energy > >>> asb100 > >>> asc7621 > >>> aspeed-pwm-tacho > > > > > > > -- Shine bright, (: Nav :)