On 07/24/2012 03:22 PM, Andreas Herrmann wrote:
CC-ing Andre
On Tue, Jul 24, 2012 at 03:15:14PM +0200, Thomas Renninger wrote:
Hi,
I recently got pointed to performance losses measured
with and without cpufreq enabled when people worked on
scheduler tunables/improvements.
Depending on whether processes are bound to cores, tunables
inside the cpufreq subsystem, etc. there can be rather big
differences.
While there have been improvements (for example do not poll
that often if constantly running at highest frequency and
others), dynamic cpufreq adjusting as it currently is
implemented via ondemand/conservative governors always
will cost performance.
Arjan mentioned quite some time ago, that for modern X86
processors it does not make much sense to control the
frequency of the CPU via OS, because idle states are
much more efficient and should get entered asap.
Especially on bigger X86 systems with dozens or even hundreds
of cores, cpufreq polling sounds like a bad idea.
Especially if the CPUs do achieve the same or even
better performance/power results via entering C-states quickly.
I would like to come up with a init_default_governor()
or similar function which choses the performance governor
for such CPUs.
Hm, maybe it could get a driver callback, then this one could
be picked up by acpi-cpufreq (and powernow-k8 if applicable)
and those drivers could choose the right governor for the
platform/cpu.
Actually we are currently also looking into this issue.
So I'd refrain at least for now from changing the default governor.
Let's see what Arjan comes up with, I'd also like to see a reworked
governor instead of changing the default one.
To answer your questions nevertheless:
Ideally identifying the CPUs where performance governor should get
used is a one liner checking for a cpu flag.
But this might not get that easy? CPU family/model would need
maintenance if there is no cpu flag/feature to test for.
From the AMD side it looks like family >= 0x11 should do the trick. The
actual difference is the availability of >C1 per core sleep states.
Family 10h and 11h have C1 and C1e, but no real deeper states.
Let me see if there is some register or bit we can reliably query to
check for this instead of relying on fragile f/m/s values.
On Phenoms I could measure much lower power usage with ondemand (or
conservative) governor on idle. This was not true for Bulldozers
anymore, so there is some truth in your idea.
Regards,
Andre.
Just some ideas..., if it's doable with some lines of code without
the need of maintaining/adding new cpu families, I'd like to have
a better default behavior.
One main problem I am facing is: Measuring power consumption
in different workloads.
I can measure the power consumption in idle (deeper sleep
states entered) when CPU frequency is set to lowest and highest
and compare. If both are the same, the CPU is a good candidate
to not do OS controlled CPU frequency scaling.
What do you think?
Thanks,
Thomas
--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html