Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?

Lukasz Luba <lukasz.luba@xxxxxxx> · Thu, 24 Sep 2020 09:21:57 +0100

On 9/24/20 7:09 AM, Rajendra Nayak wrote:

On 9/16/2020 10:18 PM, Matthias Kaehlcke wrote:
On Wed, Sep 16, 2020 at 10:53:48AM +0100, Lukasz Luba wrote:

On 9/15/20 9:55 PM, Daniel Lezcano wrote:
On 15/09/2020 19:58, Matthias Kaehlcke wrote:
On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
On 15/09/2020 19:24, Matthias Kaehlcke wrote:
+Thermal folks

Hi Rajendra,

On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
Hi Rob,

There has been some discussions on another thread [1] around the 
DPC (dynamic-power-coefficient) values
for CPU's being relative vs absolute (based on real power) and 
should they be used to derive 'real' power
at various OPPs in order to calculate things like 
'sustainable-power' for thermal zones.
I believe relative values work perfectly fine for scheduling 
decisions, but with others using this for
calculating power values in mW, is there a need to document the 
property as something that *has* to be
based on real power measurements?

Relative values may work for scheduling decisions, but not for 
thermal
management with the power allocator, at least not when CPU 
cooling devices
are combined with others that specify their power consumption in 
absolute
values. Such a configuration should be supported IMO.

The energy model is used in the cpufreq cooling device and if the
sustainable power is consistent with the relative values then 
there is
no reason it shouldn't work.

Agreed on thermal zones that exclusively use CPUs as cooling 
devices, but
what when you have mixed zones, with CPUs with their pseudo-unit 
and e.g. a
GPU that specifies its power in mW?

Well, if a SoC vendor decides to mix the units, then there is 
nothing we
can do.

When specifying the power numbers available for the SoC, they could be
all scaled against the highest power number.

There are so many factors on the hardware, the firmware, the kernel and
the userspace sides having an impact on the energy efficiency, I don't
understand why SoC vendors are so shy to share the power numbers...

Unfortunately (because it might confuse engineers in some cases like
this one), even in the SCMI spec DEN0056B [1] we have this statement
which allows to expose an 'abstract scale' values from firmware:
'4.5.1 Performance domain management protocol background
...The power can be expressed in mW or in an abstract scale. Vendors
are not obliged to reveal power costs if it is undesirable, but a linear
scale is required.'

This is the source of our Energy Model values when we use SCMI cpufreq
driver [2].

So this might be an issue in the future, when some SoC vendor decides to
not expose the real mW, but the phone OEM would then take the SoC and
try to add some other cooling device into the thermal zone. That new
device is not part of the SCMI perf but some custom and has the real mW.

Do you think Daniel it should be somewhere documented in the kernel
thermal that the firmware might silently populate EM with 'abstract
scale'? Then special care should be taken when combining new
cooling devices.

Regards,
Lukasz

[1] https://developer.arm.com/documentation/den0056/b/?lang=en
[2] 
https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121 

If an 'abstract scale' is explicitly allowed I think it should be 
documented
to avoid confusion and make engineers aware of the peril of combining 
cooling
devices of different types in the same thermal zone.

Rob, we should perhaps also document this as part of the DT bindings 
document
to be consistent, that an abstract scale is allowed when specifying the DPC
values in DT.
if you agree, I can spin a quick patch to update the documentation.

The 'dynamic-power-coefficient' which is in the:
Documentation/devicetree/bindings/arm/cpus.yaml does not need any update
because it expects units of 'uW/MHz/V^2' to calculate dynamic power.

You have two ways to register Energy Model for a device:
1. em_dev_register_perf_domain() where you provide the callback function
and that can feed the 'abstract scale' (like the scmi-cpufreq.c)
2. dev_pm_opp_of_register_em() where the 'dynamic-power-coefficient'
is going to be involved.

If the developer would see that the platform might face potential issue
of mixing devices in one thermal zone of two scales, it should not use
the 2nd registration, but the 1st API and provide callback with
consistent scale to all devices. It is also very unlikely that the
device like GPU or DSP would not be part of the scmi perf domains
and would not expose a consistent abstract scale.

I have a patch spinning in our internal review to update EAS, EM, IPA
documentation and that would be updated soon.

Regards,
Lukasz