On Wed, Sep 16, 2020 at 10:53:48AM +0100, Lukasz Luba wrote: > > > On 9/15/20 9:55 PM, Daniel Lezcano wrote: > > On 15/09/2020 19:58, Matthias Kaehlcke wrote: > > > On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote: > > > > On 15/09/2020 19:24, Matthias Kaehlcke wrote: > > > > > +Thermal folks > > > > > > > > > > Hi Rajendra, > > > > > > > > > > On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote: > > > > > > Hi Rob, > > > > > > > > > > > > There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values > > > > > > for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power > > > > > > at various OPPs in order to calculate things like 'sustainable-power' for thermal zones. > > > > > > I believe relative values work perfectly fine for scheduling decisions, but with others using this for > > > > > > calculating power values in mW, is there a need to document the property as something that *has* to be > > > > > > based on real power measurements? > > > > > > > > > > Relative values may work for scheduling decisions, but not for thermal > > > > > management with the power allocator, at least not when CPU cooling devices > > > > > are combined with others that specify their power consumption in absolute > > > > > values. Such a configuration should be supported IMO. > > > > > > > > The energy model is used in the cpufreq cooling device and if the > > > > sustainable power is consistent with the relative values then there is > > > > no reason it shouldn't work. > > > > > > Agreed on thermal zones that exclusively use CPUs as cooling devices, but > > > what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a > > > GPU that specifies its power in mW? > > > > Well, if a SoC vendor decides to mix the units, then there is nothing we > > can do. > > > > When specifying the power numbers available for the SoC, they could be > > all scaled against the highest power number. > > > > There are so many factors on the hardware, the firmware, the kernel and > > the userspace sides having an impact on the energy efficiency, I don't > > understand why SoC vendors are so shy to share the power numbers... > > > > Unfortunately (because it might confuse engineers in some cases like > this one), even in the SCMI spec DEN0056B [1] we have this statement > which allows to expose an 'abstract scale' values from firmware: > '4.5.1 Performance domain management protocol background > ...The power can be expressed in mW or in an abstract scale. Vendors > are not obliged to reveal power costs if it is undesirable, but a linear > scale is required.' > > This is the source of our Energy Model values when we use SCMI cpufreq > driver [2]. > > So this might be an issue in the future, when some SoC vendor decides to > not expose the real mW, but the phone OEM would then take the SoC and > try to add some other cooling device into the thermal zone. That new > device is not part of the SCMI perf but some custom and has the real mW. > > Do you think Daniel it should be somewhere documented in the kernel > thermal that the firmware might silently populate EM with 'abstract > scale'? Then special care should be taken when combining new > cooling devices. > > Regards, > Lukasz > > [1] https://developer.arm.com/documentation/den0056/b/?lang=en > [2] https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121 If an 'abstract scale' is explicitly allowed I think it should be documented to avoid confusion and make engineers aware of the peril of combining cooling devices of different types in the same thermal zone.