On 9/16/2020 10:18 PM, Matthias Kaehlcke wrote:
On Wed, Sep 16, 2020 at 10:53:48AM +0100, Lukasz Luba wrote:
On 9/15/20 9:55 PM, Daniel Lezcano wrote:
On 15/09/2020 19:58, Matthias Kaehlcke wrote:
On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
On 15/09/2020 19:24, Matthias Kaehlcke wrote:
+Thermal folks
Hi Rajendra,
On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
Hi Rob,
There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
I believe relative values work perfectly fine for scheduling decisions, but with others using this for
calculating power values in mW, is there a need to document the property as something that *has* to be
based on real power measurements?
Relative values may work for scheduling decisions, but not for thermal
management with the power allocator, at least not when CPU cooling devices
are combined with others that specify their power consumption in absolute
values. Such a configuration should be supported IMO.
The energy model is used in the cpufreq cooling device and if the
sustainable power is consistent with the relative values then there is
no reason it shouldn't work.
Agreed on thermal zones that exclusively use CPUs as cooling devices, but
what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
GPU that specifies its power in mW?
Well, if a SoC vendor decides to mix the units, then there is nothing we
can do.
When specifying the power numbers available for the SoC, they could be
all scaled against the highest power number.
There are so many factors on the hardware, the firmware, the kernel and
the userspace sides having an impact on the energy efficiency, I don't
understand why SoC vendors are so shy to share the power numbers...
Unfortunately (because it might confuse engineers in some cases like
this one), even in the SCMI spec DEN0056B [1] we have this statement
which allows to expose an 'abstract scale' values from firmware:
'4.5.1 Performance domain management protocol background
...The power can be expressed in mW or in an abstract scale. Vendors
are not obliged to reveal power costs if it is undesirable, but a linear
scale is required.'
This is the source of our Energy Model values when we use SCMI cpufreq
driver [2].
So this might be an issue in the future, when some SoC vendor decides to
not expose the real mW, but the phone OEM would then take the SoC and
try to add some other cooling device into the thermal zone. That new
device is not part of the SCMI perf but some custom and has the real mW.
Do you think Daniel it should be somewhere documented in the kernel
thermal that the firmware might silently populate EM with 'abstract
scale'? Then special care should be taken when combining new
cooling devices.
Regards,
Lukasz
[1] https://developer.arm.com/documentation/den0056/b/?lang=en
[2] https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121
If an 'abstract scale' is explicitly allowed I think it should be documented
to avoid confusion and make engineers aware of the peril of combining cooling
devices of different types in the same thermal zone.
Rob, we should perhaps also document this as part of the DT bindings document
to be consistent, that an abstract scale is allowed when specifying the DPC
values in DT.
if you agree, I can spin a quick patch to update the documentation.
thanks,
Rajendra
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation