Re: [PATCH v4 1/3] hwmon: Add amd_energy driver to report energy counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Guenter

On Mon, 27 Apr 2020 at 20:34, Naveen Krishna Ch
<naveenkrishna.ch@xxxxxxxxx> wrote:
>
> Hi Guenter,
>
> On Mon, 27 Apr 2020 at 19:04, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >
> > On 4/27/20 5:39 AM, Naveen Krishna Ch wrote:
> > > Hi Guenter,
> > >
> > > On Sat, 25 Apr 2020 at 21:47, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> > >>
> > >> On Fri, Apr 24, 2020 at 08:50:54AM +0530, Naveen Krishna Chatradhi wrote:
> > >>> This patch adds hwmon based amd_energy driver support for
> > >>> family 17h processors from AMD.
> > >>>
> > >>> The driver provides following interface to the userspace
> > >>> 1. Reports the per core consumption
> > >>>       * file: "energy%d_input", label: "Ecore%03d"
> > >>> 2. Reports per socket energy consumption
> > >>>       * file: "energy%d_input", label: "Esocket%d"
> > >>> 2. Reports scaled energy value in Joules.
> > >>>
> > >>> Cc: Guenter Roeck <linux@xxxxxxxxxxxx>
> > >>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@xxxxxxx>
> > >>
> > >> Couple of additional comments below.
> > >>
> > >> On a higher level, I noticed that the data overflows quickly
> > >> (ie within a day or so), which is the reason why the matching
> > >> code for Intel CPUs never made it into the kernel. With that
> > >> in mind, I do wonder if this is really valuable. I am quite
> > >> concerned about people complaining that the data is useless,
> > >> and I have to say that I find it quite useless myself. Any
> > >> system running for more than a few hours will report more or
> > >> less random data. Any thoughts on that ?
> > > This driver will also address the future platforms with
> > > support for 64-bit counters.
> > >
> > > We do have platforms that will come out very shortly, which will
> > > support a different energy resolution to increase the wraparound
> > > time with 32bit counters,
> > >
> > > On a platform with 2 sockets each with 64 cores,
> > > Assuming 240W, new resolution of 15.6 milli Joules
> > >
> > > -  Wrap around time for socket energy counter will be
> > > (up to ~3 to 4 days with maximum load).
> > >
> > > 2^32*15.625e-3/240 = 279620.2667 secs to wrap (i.e 3.2 days)
> > >
> > > - Wrap around time for the per-core energy counters with the
> > > new resolution will be
> > >
> > > 2^32*15.6e-3/ (240 * 128) = 2184533.33 secs to wrap (i.e 25 days)
> > >
> > > When a work load is to be run on certain predefined cores.
> > > The Work load managers can gather the energy status before starting
> > > and after completion of the job and measure power consumed by the
> > > work load.
> > >
> > All that doesn't help much for existing platforms, nor for users who
> > expect that counters don't wrap around at all (at least not until they
> > reach the 64-bit limit).
>
> The 3 energy counter MSRs are added newly in family 17h, and this
> driver supports family 17h and later.
>
> >
> > I see two options. Either provide power reporting instead (which should
> > be done in the k10temp driver), or implement a kernel thread which runs
> > often enough to catch wraparounds. While not perfect (it will only report
> > the energy since the driver was loaded), it will at least avoid the
> > frequent wraparounds seen today, and that caveat can be documented.
> Sure, i can updated the k10temp driver once a documented way of power
> reporting is available. For now, i will implement a kernel thread to reduce
> the wrap around and update the documentation.
I've addressed your comments and submitted the next version. Could you
review and let us your know comments. Thank you.
>
> >
> > >>
> > >> How about making the power reporting registers available and
> > >> reporting per-CPU power consumption instead ? I think that would
> > >> add much more value.
> > > We will expose power reporting when platform exposes that information.
> > > Until then, energy reporting becomes further critical.
> > >
> >
> > Some Windows tools such as HwInfo report power readings today,
> > on existing CPUs, so I don't really buy the claim that existing
> > chips don't report it.
> This driver is needed for servers based on Naples and Rome
> which runs on Linux. The energy MSRs are accumulating registers
> updated every 1ms. At present, this is a reliable and documented
> way to measure power consumption over a period. Also, there
> is no documented way to report power readings at this point.
>
> >
> > Thanks,
> > Guenter
> --
> Shine bright,
> (: Nav :)



-- 
Shine bright,
(: Nav :)



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux