Hi Guenter, On Mon, 27 Apr 2020 at 19:04, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > On 4/27/20 5:39 AM, Naveen Krishna Ch wrote: > > Hi Guenter, > > > > On Sat, 25 Apr 2020 at 21:47, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > >> > >> On Fri, Apr 24, 2020 at 08:50:54AM +0530, Naveen Krishna Chatradhi wrote: > >>> This patch adds hwmon based amd_energy driver support for > >>> family 17h processors from AMD. > >>> > >>> The driver provides following interface to the userspace > >>> 1. Reports the per core consumption > >>> * file: "energy%d_input", label: "Ecore%03d" > >>> 2. Reports per socket energy consumption > >>> * file: "energy%d_input", label: "Esocket%d" > >>> 2. Reports scaled energy value in Joules. > >>> > >>> Cc: Guenter Roeck <linux@xxxxxxxxxxxx> > >>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@xxxxxxx> > >> > >> Couple of additional comments below. > >> > >> On a higher level, I noticed that the data overflows quickly > >> (ie within a day or so), which is the reason why the matching > >> code for Intel CPUs never made it into the kernel. With that > >> in mind, I do wonder if this is really valuable. I am quite > >> concerned about people complaining that the data is useless, > >> and I have to say that I find it quite useless myself. Any > >> system running for more than a few hours will report more or > >> less random data. Any thoughts on that ? > > This driver will also address the future platforms with > > support for 64-bit counters. > > > > We do have platforms that will come out very shortly, which will > > support a different energy resolution to increase the wraparound > > time with 32bit counters, > > > > On a platform with 2 sockets each with 64 cores, > > Assuming 240W, new resolution of 15.6 milli Joules > > > > - Wrap around time for socket energy counter will be > > (up to ~3 to 4 days with maximum load). > > > > 2^32*15.625e-3/240 = 279620.2667 secs to wrap (i.e 3.2 days) > > > > - Wrap around time for the per-core energy counters with the > > new resolution will be > > > > 2^32*15.6e-3/ (240 * 128) = 2184533.33 secs to wrap (i.e 25 days) > > > > When a work load is to be run on certain predefined cores. > > The Work load managers can gather the energy status before starting > > and after completion of the job and measure power consumed by the > > work load. > > > All that doesn't help much for existing platforms, nor for users who > expect that counters don't wrap around at all (at least not until they > reach the 64-bit limit). The 3 energy counter MSRs are added newly in family 17h, and this driver supports family 17h and later. > > I see two options. Either provide power reporting instead (which should > be done in the k10temp driver), or implement a kernel thread which runs > often enough to catch wraparounds. While not perfect (it will only report > the energy since the driver was loaded), it will at least avoid the > frequent wraparounds seen today, and that caveat can be documented. Sure, i can updated the k10temp driver once a documented way of power reporting is available. For now, i will implement a kernel thread to reduce the wrap around and update the documentation. > > >> > >> How about making the power reporting registers available and > >> reporting per-CPU power consumption instead ? I think that would > >> add much more value. > > We will expose power reporting when platform exposes that information. > > Until then, energy reporting becomes further critical. > > > > Some Windows tools such as HwInfo report power readings today, > on existing CPUs, so I don't really buy the claim that existing > chips don't report it. This driver is needed for servers based on Naples and Rome which runs on Linux. The energy MSRs are accumulating registers updated every 1ms. At present, this is a reliable and documented way to measure power consumption over a period. Also, there is no documented way to report power readings at this point. > > Thanks, > Guenter -- Shine bright, (: Nav :)