On Tue, 27 Apr 2010 13:39:34 +0200 Thomas Renninger <trenn@xxxxxxx> wrote: > On Friday 23 April 2010 06:08:19 pm Arjan van de Ven wrote: > > On Fri, 23 Apr 2010 10:50:10 +0200 > > Thomas Renninger <trenn@xxxxxxx> wrote: > > Especially on battery, users will appreciate some minutes > > > > > of more battery lifetime and do not care about some ms of IO > > > latencies. > > > > the assumption that power doesn't matter on AC is a huge fiction > > that any data center operator would love to get out of everyones > > head as quickly as possible. > > Have I said power doesn't matter on AC? > Do you agree that a datacenter has different performance vs power > tradeoff demands as a battery driven mobile device? > > Back to the topic: > As you did not answer on my (several) sysfs knob request(s), I expect > you agree with it and will add one. > yup it makes sense to have a sysfs knob with a sane default value From: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> Subject: [PATCH] cpufreq: make the iowait-is-busy-time a sysfs tunable Pavel Machek pointed out that not all CPUs have an efficient idle at high frequency. Specifically, older Intel and various AMD cpus would get a higher power usage when copying files from USB. Mike Chan pointed out that the same is true for various ARM chips as well. Thomas Renninger suggested to make this a sysfs tunable with a reasonable default. This patch adds a sysfs tunable for the new behavior, and uses a very simple function to determine a reasonable default, depending on the CPU vendor/type. Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> --- drivers/cpufreq/cpufreq_ondemand.c | 46 +++++++++++++++++++++++++++++++++++- 1 files changed, 45 insertions(+), 1 deletions(-) diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index ed472f8..4877e8f 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -109,6 +109,7 @@ static struct dbs_tuners { unsigned int down_differential; unsigned int ignore_nice; unsigned int powersave_bias; + unsigned int io_is_busy; } dbs_tuners_ins = { .up_threshold = DEF_FREQUENCY_UP_THRESHOLD, .down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL, @@ -260,6 +261,7 @@ static ssize_t show_##file_name \ return sprintf(buf, "%u\n", dbs_tuners_ins.object); \ } show_one(sampling_rate, sampling_rate); +show_one(io_is_busy, io_is_busy); show_one(up_threshold, up_threshold); show_one(ignore_nice_load, ignore_nice); show_one(powersave_bias, powersave_bias); @@ -310,6 +312,22 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b, return count; } +static ssize_t store_io_is_busy(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + + mutex_lock(&dbs_mutex); + dbs_tuners_ins.io_is_busy = !!input; + mutex_unlock(&dbs_mutex); + + return count; +} + static ssize_t store_up_threshold(struct kobject *a, struct attribute *b, const char *buf, size_t count) { @@ -392,6 +410,7 @@ static struct global_attr _name = \ __ATTR(_name, 0644, show_##_name, store_##_name) define_one_rw(sampling_rate); +define_one_rw(io_is_busy); define_one_rw(up_threshold); define_one_rw(ignore_nice_load); define_one_rw(powersave_bias); @@ -403,6 +422,7 @@ static struct attribute *dbs_attributes[] = { &up_threshold.attr, &ignore_nice_load.attr, &powersave_bias.attr, + &io_is_busy.attr, NULL }; @@ -527,7 +547,7 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info) * from the cpu idle time. */ - if (idle_time >= iowait_time) + if (dbs_tuners_ins.io_is_busy && idle_time >= iowait_time) idle_time -= iowait_time; if (unlikely(!wall_time || wall_time < idle_time)) @@ -643,6 +663,29 @@ static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info) cancel_delayed_work_sync(&dbs_info->work); } +/* + * Not all CPUs want IO time to be accounted as busy; this depends on how + * efficient idling at a higher frequency/voltage is. + * Pavel Machek says this is not so for various generations of AMD and old + * Intel systems. + * Mike Chan (android.com) says this is also not true for ARM. + * Because of this, whitelist specific known (series) of CPUs by default, and + * leave all others up to the user. + */ +static int should_io_be_busy(void) +{ +#if defined(CONFIG_X86) + /* + * For Intel, Core 2 (model 15) and later have an efficient idle. + */ + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && + boot_cpu_data.x86 == 6 && + boot_cpu_data.x86_model >= 15) + return 1; +#endif + return 0; +} + static int cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event) { @@ -705,6 +748,7 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy, dbs_tuners_ins.sampling_rate = max(min_sampling_rate, latency * LATENCY_MULTIPLIER); + dbs_tuners_ins.io_is_busy = should_io_be_busy(); } mutex_unlock(&dbs_mutex); -- 1.6.1.3 -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html