Hi Guenter On Mon, 27 Apr 2020 at 20:34, Naveen Krishna Ch <naveenkrishna.ch@xxxxxxxxx> wrote: > > Hi Guenter, > > On Mon, 27 Apr 2020 at 19:04, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > > > On 4/27/20 5:39 AM, Naveen Krishna Ch wrote: > > > Hi Guenter, > > > > > > On Sat, 25 Apr 2020 at 21:47, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > >> > > >> On Fri, Apr 24, 2020 at 08:50:54AM +0530, Naveen Krishna Chatradhi wrote: > > >>> This patch adds hwmon based amd_energy driver support for > > >>> family 17h processors from AMD. > > >>> > > >>> The driver provides following interface to the userspace > > >>> 1. Reports the per core consumption > > >>> * file: "energy%d_input", label: "Ecore%03d" > > >>> 2. Reports per socket energy consumption > > >>> * file: "energy%d_input", label: "Esocket%d" > > >>> 2. Reports scaled energy value in Joules. > > >>> > > >>> Cc: Guenter Roeck <linux@xxxxxxxxxxxx> > > >>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@xxxxxxx> > > >> > > >> Couple of additional comments below. > > >> > > >> On a higher level, I noticed that the data overflows quickly > > >> (ie within a day or so), which is the reason why the matching > > >> code for Intel CPUs never made it into the kernel. With that > > >> in mind, I do wonder if this is really valuable. I am quite > > >> concerned about people complaining that the data is useless, > > >> and I have to say that I find it quite useless myself. Any > > >> system running for more than a few hours will report more or > > >> less random data. Any thoughts on that ? > > > This driver will also address the future platforms with > > > support for 64-bit counters. > > > > > > We do have platforms that will come out very shortly, which will > > > support a different energy resolution to increase the wraparound > > > time with 32bit counters, > > > > > > On a platform with 2 sockets each with 64 cores, > > > Assuming 240W, new resolution of 15.6 milli Joules > > > > > > - Wrap around time for socket energy counter will be > > > (up to ~3 to 4 days with maximum load). > > > > > > 2^32*15.625e-3/240 = 279620.2667 secs to wrap (i.e 3.2 days) > > > > > > - Wrap around time for the per-core energy counters with the > > > new resolution will be > > > > > > 2^32*15.6e-3/ (240 * 128) = 2184533.33 secs to wrap (i.e 25 days) > > > > > > When a work load is to be run on certain predefined cores. > > > The Work load managers can gather the energy status before starting > > > and after completion of the job and measure power consumed by the > > > work load. > > > > > All that doesn't help much for existing platforms, nor for users who > > expect that counters don't wrap around at all (at least not until they > > reach the 64-bit limit). > > The 3 energy counter MSRs are added newly in family 17h, and this > driver supports family 17h and later. > > > > > I see two options. Either provide power reporting instead (which should > > be done in the k10temp driver), or implement a kernel thread which runs > > often enough to catch wraparounds. While not perfect (it will only report > > the energy since the driver was loaded), it will at least avoid the > > frequent wraparounds seen today, and that caveat can be documented. > Sure, i can updated the k10temp driver once a documented way of power > reporting is available. For now, i will implement a kernel thread to reduce > the wrap around and update the documentation. I've addressed your comments and submitted the next version. Could you review and let us your know comments. Thank you. > > > > > >> > > >> How about making the power reporting registers available and > > >> reporting per-CPU power consumption instead ? I think that would > > >> add much more value. > > > We will expose power reporting when platform exposes that information. > > > Until then, energy reporting becomes further critical. > > > > > > > Some Windows tools such as HwInfo report power readings today, > > on existing CPUs, so I don't really buy the claim that existing > > chips don't report it. > This driver is needed for servers based on Naples and Rome > which runs on Linux. The energy MSRs are accumulating registers > updated every 1ms. At present, this is a reliable and documented > way to measure power consumption over a period. Also, there > is no documented way to report power readings at this point. > > > > > Thanks, > > Guenter > -- > Shine bright, > (: Nav :) -- Shine bright, (: Nav :)