Hi Chanwoo, On 5 June 2013 13:41, Chanwoo Choi <cw00.choi@xxxxxxxxxxx> wrote: > This patch add new sysfs file to show previous accumulated data of CPU load Please mention we are only accumulating latest 20 values. > as following path. This sysfs file is used to judge the correct system state > or determine suitable system resource on user-space. Please write it as: load_table will be used to judge how many cpus would be sufficient for managing current load. > - /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table > > This sysfs file include following data: > - Measurement point of time > - CPU frequency > - Per-CPU load > > Signed-off-by: Chanwoo Choi <cw00.choi@xxxxxxxxxxx> > Signed-off-by: Myungjoo Ham <myungjoo.ham@xxxxxxxxxxx> > Signed-off-by: Kyungmin Park <kyungmin.park@xxxxxxxxxxx> > --- > > Additionally, I explain an example using 'load_table' sysfs entry. > > Exynos4412 series has Quad-core and all cores share the power-line. > I cann't set diffent voltage/frequency to each CPU. To reduce power- > consumption, I certainly have to turn on/off CPU online state > according to CPU load on runtime. As a result, I peridically need to > monitor current cpu state to determine a proper amount of system > resource(necessary number of online cpu) and to delete wasted power. > So, I need 'load_table' sysfs file to monitor current cpu state. > > I add a table which show power consumption of target based on > Exynos4412 SoC. This table indicate the difference power-consumption > according to a number of online core and with same number of running task. > > [Environment of power estimation] > - cpufreq governor used performance mode to estimate power-consumption on each frequency step. > - Use infinite-loop test program which execute while statement infinitely. > - Always measure the power consumption under same temperature during 1 minutes. > - Unit is mA. > ------------------------------------------------------------------------------------------------------------------------------------ > A number of Online core | Core 1 | Core 2 | Core 3 | Core 4 > A number of nr_running | 0 1 | 0 1 2 | 0 1 2 3 | 0 1 2 3 4 > ------------------------------------------------------------------------------------------------------------------------------------ > CPU Frequency | > 800 MHz | 293 330 | 295 338 379 | 300 339 386 433 | 303 341 390 412 482 > 1000 MHz | 312 371 | 316 377 435 | 318 383 454 522 | 322 391 462 526 596 > 1200 MHz | 323 404 | 328 418 504 | 336 423 521 615 | 343 433 499 639 748 > 1600 MHz | 380 525 | 388 556 771 | 399 575 771 1011 | 412 597 822 1172 1684 > ------------------------------------------------------------------------------------------------------------------------------------ > > For example, > The case A/B/C have the same condition except for a number of online core. > - case A: Online core is 2, 1000MHz and nr_running is 1 : 377mA > - case B: Online core is 3, 1000MHz and nr_running is 1 : 383mA > - case C: Online core is 4, 1000Mz and nr_running is 1 : 391mA > > If system has only one running task, cpu hotplug policy, by monitoring > cpu state through 'load_table' sysfs file on user-space, > will determine 'case A' state for reducing power-consumption. > > Show the result of reading 'load_table sysfs file as following: > - cpufreq governor is ondemand_org governor. > > $ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table > Time Frequency cpu0 cpu1 cpu2 cpu3 > 1300500043122 1600000 32 6 0 26 > 1300600079080 800000 63 10 2 45 > 1300700065288 800000 51 9 1 42 > 1300800228747 800000 51 9 1 43 > 1300900182997 800000 78 11 3 47 > 1301000106163 800000 96 26 6 48 > 1301100056247 1600000 45 7 1 27 > 1301200071373 1000000 55 9 1 37 > 1301300096082 800000 54 10 0 45 > 1301400132832 800000 70 11 2 46 > 1301500082290 800000 61 11 1 43 > 1301600236415 800000 61 9 2 43 > 1301700071498 800000 59 10 2 43 > 1301800159290 800000 55 9 1 42 > 1301900076332 800000 66 10 2 43 > 1302000102165 800000 47 9 0 43 > 1302100086623 800000 75 11 2 50 > 1302200101082 800000 66 10 4 46 > 1302300108873 800000 53 10 1 44 > 1302400373373 600000 63 12 1 54 How are you getting loads different for all your cpus? I believe you are just recording these values for policy->cpu and all cpus share same policy on your platform. > Please let me know some opinion about this patch. > > Best regards and Thanks, > Chanwoo Choi > > --- > drivers/cpufreq/cpufreq.c | 4 +++ > drivers/cpufreq/cpufreq_governor.c | 21 ++++++++++-- > drivers/cpufreq/cpufreq_stats.c | 70 ++++++++++++++++++++++++++++++++++++++ > include/linux/cpufreq.h | 6 ++++ > 4 files changed, 99 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 2d53f47..cbaaff0 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -292,6 +292,10 @@ void __cpufreq_notify_transition(struct cpufreq_policy *policy, > if (likely(policy) && likely(policy->cpu == freqs->cpu)) > policy->cur = freqs->new; > break; > + case CPUFREQ_LOADCHECK: > + srcu_notifier_call_chain(&cpufreq_transition_notifier_list, > + CPUFREQ_LOADCHECK, freqs); > + break; > } > } > /** > diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c > index 5af40ad..bc50624 100644 > --- a/drivers/cpufreq/cpufreq_governor.c > +++ b/drivers/cpufreq/cpufreq_governor.c > @@ -23,12 +23,17 @@ > #include <linux/kernel_stat.h> > #include <linux/mutex.h> > #include <linux/slab.h> > +#include <linux/sched.h> > #include <linux/tick.h> > #include <linux/types.h> > #include <linux/workqueue.h> > > #include "cpufreq_governor.h" > > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + struct cpufreq_freqs freqs; > +#endif Why do you need this to be global? > static struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy) > { > if (have_governor_per_policy()) > @@ -143,11 +148,17 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) > idle_time += jiffies_to_usecs(cur_nice_jiffies); > } > > - if (unlikely(!wall_time || wall_time < idle_time)) > + if (unlikely(!wall_time || wall_time < idle_time)) { > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + freqs.load[j] = 0; > +#endif > continue; > + } > > load = 100 * (wall_time - idle_time) / wall_time; > - > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + freqs.load[j] = load; > +#endif > if (dbs_data->cdata->governor == GOV_ONDEMAND) { > int freq_avg = __cpufreq_driver_getavg(policy, j); > if (freq_avg <= 0) > @@ -160,6 +171,12 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) > max_load = load; > } > > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + freqs.time = ktime_to_ns(ktime_get()); > + freqs.old = freqs.new = policy->cur; > + > + cpufreq_notify_transition(policy, &freqs, CPUFREQ_LOADCHECK); > +#endif > dbs_data->cdata->gov_check_cpu(cpu, max_load); > } > EXPORT_SYMBOL_GPL(dbs_check_cpu); > diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c > index fb65dec..2379b1d 100644 > --- a/drivers/cpufreq/cpufreq_stats.c > +++ b/drivers/cpufreq/cpufreq_stats.c > @@ -22,6 +22,8 @@ > #include <linux/notifier.h> > #include <asm/cputime.h> > > +#define CPUFREQ_LOAD_TABLE_MAX 20 > + > static spinlock_t cpufreq_stats_lock; > > struct cpufreq_stats { > @@ -35,6 +37,10 @@ struct cpufreq_stats { > unsigned int *freq_table; > #ifdef CONFIG_CPU_FREQ_STAT_DETAILS > unsigned int *trans_table; > + > + struct cpufreq_freqs *load_table; > + unsigned int load_last_index; > + unsigned int load_max_index; > #endif > }; > > @@ -131,6 +137,38 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) > return len; > } > cpufreq_freq_attr_ro(trans_table); > + > +static ssize_t show_load_table(struct cpufreq_policy *policy, char *buf) > +{ > + struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); > + struct cpufreq_freqs *load_table = stat->load_table; > + ssize_t len = 0; > + int i; > + int j; merge above two lines. > + > + len += sprintf(buf + len, "%11s %10s", "Time", "Frequency"); > + for (j = 0; j < NR_CPUS; j++) > + len += sprintf(buf + len, " %3s%d", "cpu", j); > + len += sprintf(buf + len, "\n"); > + > + i = stat->load_last_index; > + do { > + len += sprintf(buf + len, "%lld %9d", > + load_table[i].time, > + load_table[i].old); > + > + for (j = 0; j < NR_CPUS; j++) > + len += sprintf(buf + len, " %4d", > + load_table[i].load[j]); > + len += sprintf(buf + len, "\n"); > + > + if (++i == stat->load_max_index) > + i = 0; > + } while (i != stat->load_last_index); You want/need some locking to protect addition to this list while we are reading from it? > + return len; > +} > +cpufreq_freq_attr_ro(load_table); > #endif > > cpufreq_freq_attr_ro(total_trans); > @@ -141,6 +179,7 @@ static struct attribute *default_attrs[] = { > &time_in_state.attr, > #ifdef CONFIG_CPU_FREQ_STAT_DETAILS > &trans_table.attr, > + &load_table.attr, > #endif > NULL > }; > @@ -167,6 +206,9 @@ static void cpufreq_stats_free_table(unsigned int cpu) > > if (stat) { > pr_debug("%s: Free stat table\n", __func__); > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + kfree(stat->load_table); > +#endif > kfree(stat->time_in_state); > kfree(stat); > per_cpu(cpufreq_stats_table, cpu) = NULL; > @@ -244,6 +286,16 @@ static int cpufreq_stats_create_table(struct cpufreq_policy *policy, > > #ifdef CONFIG_CPU_FREQ_STAT_DETAILS > stat->trans_table = stat->freq_table + count; > + > + stat->load_max_index = CPUFREQ_LOAD_TABLE_MAX; > + stat->load_last_index = 0; > + > + alloc_size = sizeof(struct cpufreq_freqs) * stat->load_max_index; We aren't using this variable multiple times so get rid of it and also you need to do: sizeof(*stat->load_table). > + stat->load_table = kzalloc(alloc_size, GFP_KERNEL); > + if (!stat->load_table) { > + ret = -ENOMEM; > + goto error_out; > + } > #endif > j = 0; > for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) { > @@ -312,13 +364,31 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, > struct cpufreq_stats *stat; > int old_index, new_index; > > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + if (val != CPUFREQ_POSTCHANGE && val != CPUFREQ_LOADCHECK) > +#else > if (val != CPUFREQ_POSTCHANGE) > +#endif > return 0; > > stat = per_cpu(cpufreq_stats_table, freq->cpu); > if (!stat) > return 0; > > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + if (val == CPUFREQ_LOADCHECK) { > + struct cpufreq_freqs *dest_freq; > + > + dest_freq = &stat->load_table[stat->load_last_index]; > + memcpy(dest_freq, freq, sizeof(struct cpufreq_freqs)); again sizeof()... You don't need to copy full structure probably. > + > + if (++stat->load_last_index == stat->load_max_index) > + stat->load_last_index = 0; > + > + return 0; > + } > +#endif > + > old_index = stat->last_index; > new_index = freq_table_get_index(stat, freq->new); > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index 037d36a..f780645 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -140,12 +140,18 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy) > #define CPUFREQ_POSTCHANGE (1) > #define CPUFREQ_RESUMECHANGE (8) > #define CPUFREQ_SUSPENDCHANGE (9) > +#define CPUFREQ_LOADCHECK (10) > > struct cpufreq_freqs { > unsigned int cpu; /* cpu nr */ > unsigned int old; > unsigned int new; > u8 flags; /* flags of cpufreq_driver, see below. */ > + > +#ifdef CONFIG_CPU_FREQ_STAT_DETAILS > + int64_t time; > + unsigned int load[NR_CPUS]; > +#endif > }; Other wise it looks good mostly. PS: I have cc'd you for a patch of mine which will get rid of most of the CONFIG_*** you used in your code.. But wait for it to be applied to change your code accordingly.. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html