Re: [PATCH v2] tools: create power/x86/x86_energy_perf_policy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ä 11/25/2010 1:52 PM, Chen Gong åé:
> ä 11/24/2010 1:31 PM, Len Brown åé:
>> From: Len Brown<len.brown@xxxxxxxxx>
>>
>> MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
>> It is implemented in all Sandy Bridge processors -- mobile, desktop
>> and server.
>> It is expected to become increasingly important in subsequent
>> generations.
>>
>> x86_energy_perf_policy is a user-space utility to set this
>> hardware energy vs performance policy hint in the processor.
>> Most systems would benefit from "x86_energy_perf_policy normal"
>> at system startup, as the hardware default is maximum performance
>> at the expense of energy efficiency.
>>
>> Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
>> if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
>> though the kernel does not actually program the MSR.
>>
>> In March, Venkatesh Pallipadi proposed a small driver
>> that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
>> the cpufreq governor in use. It also offered
>> a boot-time cmdline option to override.
>> http://lkml.org/lkml/2010/3/4/457
>> But hiding the hardware policy behind the
>> governor choice was deemed "kinda icky".
>>
>> So in June, I proposed a generic user/kernel API to
>> consolidate the power/performance policy trade-off.
>> "RFC: /sys/power/policy_preference"
>> http://lkml.org/lkml/2010/6/16/399
>> That is my preference for implementing this capability,
>> but I received no support on the list.
>>
>> So in September, I sent x86_energy_perf_policy.c to LKML,
>> a user-space utility that scribbles directly to the MSR.
>> http://lkml.org/lkml/2010/9/28/246
>>
>> Here is the same utility re-sent, this time proposed
>> to reside in the kernel tools directory.
>>
>> Signed-off-by: Len Brown<len.brown@xxxxxxxxx>
>> ---
>> v2
>> create man page
>> minor tweaks in response to review comments
>>
>> tools/power/x86/x86_energy_perf_policy/Makefile | 8 +
>> .../x86_energy_perf_policy.8 | 104 +++++++
>> .../x86_energy_perf_policy.c | 325 ++++++++++++++++++++
>>
>> diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile
>> b/tools/power/x86/x86_energy_perf_policy/Makefile
>> new file mode 100644
>> index 0000000..f458237
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/Makefile
>> @@ -0,0 +1,8 @@
>> +x86_energy_perf_policy : x86_energy_perf_policy.c
>> +
>> +clean :
>> + rm -f x86_energy_perf_policy
>> +
>> +install :
>> + install x86_energy_perf_policy /usr/bin/
>> + install x86_energy_perf_policy.8 /usr/share/man/man8/
>> diff --git
>> a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> new file mode 100644
>> index 0000000..8eaaad6
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> @@ -0,0 +1,104 @@
>> +.\" This page Copyright (C) 2010 Len Brown<len.brown@xxxxxxxxx>
>> +.\" Distributed under the GPL, Copyleft 1994.
>> +.TH X86_ENERGY_PERF_POLICY 8
>> +.SH NAME
>> +x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
>> +.SH SYNOPSIS
>> +.ft B
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB "\-r"
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'performance'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'normal'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'powersave'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB n
>> +.br
>> +.SH DESCRIPTION
>> +\fBx86_energy_perf_policy\fP
>> +allows software to convey
>> +its policy for the relative importance of performance
>> +versus energy savings to the processor.
>> +
>> +The processor uses this information in model-specific ways
>> +when it must select trade-offs between performance and
>> +energy efficiency.
>> +
>> +This policy hint does not supersede Processor Performance states
>> +(P-states) or CPU Idle power states (C-states), but allows
>> +software to have influence where it would otherwise be unable
>> +to express a preference.
>> +
>> +For example, this setting may tell the hardware how
>> +aggressively or conservatively to control frequency
>> +in the "turbo range" above the explicitly OS-controlled
>> +P-state frequency range. It may also tell the hardware
>> +how aggressively is should enter the OS requested C-states.
>> +
>> +Support for this feature is indicated by CPUID.06H.ECX.bit3
>> +per the Intel Architectures Software Developer's Manual.
>> +
>> +.SS Options
>> +\fB-c\fP limits operation to a single CPU.
>> +The default is to operate on all CPUs.
>> +Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
>> +logical processor, but that the initial implementations
>> +of the MSR were shared among all processors in each package.
>> +.PP
>> +\fB-v\fP increases verbosity. By default
>> +x86_energy_perf_policy is silent.
>> +.PP
>> +\fB-r\fP is for "read-only" mode - the unchanged state
>> +is read and displayed.
>> +.PP
>> +.I performance
>> +Set a policy where performance is paramount.
>> +The processor will be unwilling to sacrifice any performance
>> +for the sake of energy saving. This is the hardware default.
>> +.PP
>> +.I normal
>> +Set a policy with a normal balance between performance and energy
>> efficiency.
>> +The processor will tolerate minor performance compromise
>> +for potentially significant energy savings.
>> +This reasonable default for most desktops and servers.
>> +.PP
>> +.I powersave
>> +Set a policy where the processor can accept
>> +a measurable performance hit to maximize energy efficiency.
>> +.PP
>> +.I n
>> +Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
>> +The range of valid numbers is 0-15, where 0 is maximum
>> +performance and 15 is maximum energy efficiency.
>> +
>> +.SH NOTES
>> +.B "x86_energy_perf_policy "
>> +runs only as root.
>> +.SH FILES
>> +.ta
>> +.nf
>> +/dev/cpu/*/msr
>> +.fi
>> +
>> +.SH "SEE ALSO"
>> +msr(4)
>> +.PP
>> +.SH AUTHORS
>> +.nf
>> +Written by Len Brown<len.brown@xxxxxxxxx>
>> diff --git
>> a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> new file mode 100644
>> index 0000000..b539923
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> @@ -0,0 +1,325 @@
>> +/*
>> + * x86_energy_perf_policy -- set the energy versus performance
>> + * policy preference bias on recent X86 processors.
>> + */
>> +/*
>> + * Copyright (c) 2010, Intel Corporation.
>> + * Len Brown<len.brown@xxxxxxxxx>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along with
>> + * this program; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#include<stdio.h>
>> +#include<unistd.h>
>> +#include<sys/types.h>
>> +#include<sys/stat.h>
>> +#include<sys/resource.h>
>> +#include<fcntl.h>
>> +#include<signal.h>
>> +#include<sys/time.h>
>> +#include<stdlib.h>
>> +#include<string.h>
>> +
>> +unsigned int verbose; /* set with -v */
>> +unsigned int read_only; /* set with -r */
>> +char *progname;
>> +unsigned long long new_bias;
>> +int cpu = -1;
>> +
>> +/*
>> + * Usage:
>> + *
>> + * -c cpu: limit action to a single CPU (default is all CPUs)
>> + * -v: verbose output (can invoke more than once)
>> + * -r: read-only, don't change any settings
>> + *
>> + * performance
>> + * Performance is paramount.
>> + * Unwilling to sacrafice any performance
>> + * for the sake of energy saving. (hardware default)
>> + *
>> + * normal
>> + * Can tolerate minor performance compromise
>> + * for potentially significant energy savings.
>> + * (reasonable default for most desktops and servers)
>> + *
>> + * powersave
>> + * Can tolerate significant performance hit
>> + * to maximize energy savings.
>> + *
>> + * n
>> + * a numerical value to write to the underlying MSR.
>> + */
>> +void usage(void)
>> +{
>> + printf("%s: [-c cpu] [-v] "
>> + "(-r | 'performance' | 'normal' | 'powersave' | n)\n",
>> + progname);
>> + exit(1);
>> +}
>> +
>> +#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
>> +
>> +#define BIAS_PERFORMANCE 0
>> +#define BIAS_BALANCE 6
>> +#define BIAS_POWERSAVE 15
>> +
>> +void cmdline(int argc, char **argv)
>> +{
>> + int opt;
>> +
>> + progname = argv[0];
>> +
>> + while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
>> + switch (opt) {
>> + case 'c':
>> + cpu = atoi(optarg);
>> + break;
>> + case 'r':
>> + read_only = 1;
>> + break;
>> + case 'v':
>> + verbose++;
>> + break;
>> + default:
>> + usage();
>> + }
>> + }
>> + /* if -r, then should be no additional optind */
>> + if (read_only&& (argc> optind))
>> + usage();
>> +
>> + /*
>> + * if no -r , then must be one additional optind
>> + */
>> + if (!read_only) {
>> +
>> + if (argc != optind + 1) {
>> + printf("must supply -r or policy param\n");
>> + usage();
>> + }
>> +
>> + if (!strcmp("performance", argv[optind])) {
>> + new_bias = BIAS_PERFORMANCE;
>> + } else if (!strcmp("normal", argv[optind])) {
>> + new_bias = BIAS_BALANCE;
>> + } else if (!strcmp("powersave", argv[optind])) {
>> + new_bias = BIAS_POWERSAVE;
>> + } else {
>> + char *endptr;
>> +
>> + new_bias = strtoull(argv[optind],&endptr, 0);
>> + if (endptr == argv[optind] ||
>> + new_bias> BIAS_POWERSAVE) {
>> + fprintf(stderr, "invalid value: %s\n",
>> + argv[optind]);
>> + usage();
>> + }
>> + }
>> + }
>> +}
>> +
>> +/*
>> + * validate_cpuid()
>> + * returns on success, quietly exits on failure (make verbose with -v)
>> + */
>> +void validate_cpuid(void)
>> +{
>> + unsigned int eax, ebx, ecx, edx, max_level;
>> + char brand[16];
>> + unsigned int fms, family, model, stepping;
>> +
>> + eax = ebx = ecx = edx = 0;
>> +
>> + asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
>> + "=d" (edx) : "a" (0));
>> +
>> + if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
>> + if (verbose)
>> + fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
>> + (char *)&ebx, (char *)&edx, (char *)&ecx);
>> + exit(1);
>> + }
>> +
>> + asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
>> + family = (fms>> 8)& 0xf;
>> + model = (fms>> 4)& 0xf;
>> + stepping = fms& 0xf;
>> + if (family == 6 || family == 0xf)
>> + model += ((fms>> 16)& 0xf)<< 4;
>> +
>> + if (verbose> 1)
>> + printf("CPUID %s %d levels family:model:stepping "
>> + "0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
>> + family, model, stepping, family, model, stepping);
>> +
>> + if (!(edx& (1<< 5))) {
>> + if (verbose)
>> + printf("CPUID: no MSR\n");
>> + exit(1);
>> + }
>> +
>> + /*
>> + * Support for MSR_IA32_ENERGY_PERF_BIAS
>> + * is indicated by CPUID.06H.ECX.bit3
>> + */
>> + asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a"
>> (6));
>> + if (verbose)
>> + printf("CPUID.06H.ECX: 0x%x\n", ecx);
>> + if (!(ecx& (1<< 3))) {
>> + if (verbose)
>> + printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
>> + exit(1);
>> + }
>> + return; /* success */
>> +}
>> +
>> +unsigned long long get_msr(int cpu, int offset)
>> +{
>> + unsigned long long msr;
>> + char msr_path[32];
>> + int retval;
>> + int fd;
>> +
>> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
>> + fd = open(msr_path, O_RDONLY);
>> + if (fd< 0) {
>> + printf("Try \"# modprobe msr\"\n");
>> + perror(msr_path);
>> + exit(1);
>> + }
>> +
>> + retval = pread(fd,&msr, sizeof msr, offset);
>> +
>> + if (retval != sizeof msr) {
>> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> + close(fd);
>> + return msr;
>> +}
>> +
>> +unsigned long long put_msr(int cpu, unsigned long long new_msr, int
>> offset)
>> +{
>> + unsigned long long old_msr;
>> + char msr_path[32];
>> + int retval;
>> + int fd;
>> +
>> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
>> + fd = open(msr_path, O_RDWR);
>> + if (fd< 0) {
>> + perror(msr_path);
>> + exit(1);
>> + }
>> +
>> + retval = pread(fd,&old_msr, sizeof old_msr, offset);
>> + if (retval != sizeof old_msr) {
>> + perror("pwrite");
>> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> +
>> + retval = pwrite(fd,&new_msr, sizeof new_msr, offset);
>> + if (retval != sizeof new_msr) {
>> + perror("pwrite");
>> + printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> +
>> + close(fd);
>> +
>> + return old_msr;
>> +}
>> +
>> +void print_msr(int cpu)
>> +{
>> + printf("cpu%d: 0x%016llx\n",
>> + cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
>> +}
>> +
>> +void update_msr(int cpu)
>> +{
>> + unsigned long long previous_msr;
>> +
>> + previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
>> +
>> + if (verbose)
>> + printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n",
>> + cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
>> +
>> + return;
>> +}
>> +
>> +char *proc_stat = "/proc/stat";
>> +/*
>> + * run func() on every cpu in /dev/cpu
>> + */
>> +void for_every_cpu(void (func)(int))
>> +{
>> + FILE *fp;
>> + int retval;
>> +
>> + fp = fopen(proc_stat, "r");
>> + if (fp == NULL) {
>> + perror(proc_stat);
>> + exit(1);
>> + }
>> +
>> + retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
>> + if (retval != 0) {
>> + perror("/proc/stat format");
>> + exit(1);
>> + }
>> +
>> + while (1) {
>> + int cpu;
>> +
>> + retval = fscanf(fp,
>> + "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
>> + &cpu);
>> + if (retval != 1)
>> + return;
>> +
>> + func(cpu);
>> + }
>> + fclose(fp);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> + cmdline(argc, argv);
>> +
>> + if (verbose> 1)
>> + printf("x86_energy_perf_policy Nov 24, 2010"
>> + " - Len Brown<lenb@xxxxxxxxxx>\n");
>> + if (verbose> 1&& !read_only)
>> + printf("new_bias %lld\n", new_bias);
>> +
>> + validate_cpuid();
>> +
>> + if (cpu != -1) {
>> + if (read_only)
>> + print_msr(cpu);
>> + else
>> + update_msr(cpu);
>> + } else {
>> + if (read_only)
>> + for_every_cpu(print_msr);
>> + else
>> + for_every_cpu(update_msr);
>> + }
>> +
>> + return 0;
>> +}
>>
> I have 2 questions.
>
> 1. the usage looks too simple. If I haven't read the comments
> in the source codes, I even can't know the exact meaning of these
> parameters. Such as -v, -vv etc. How about adding the comments
> as the part of the usage ?
>
> 2. the paramter "noraml | performance | powersave | n" looks weird.
> why it can't look like other paramter (-r, -v etc.). For example,
> I can't use it such as
> "./x86_energy_perf_policy -c 0 normal -v"
> --

One more question. From the spec, it should write 1 to the MSR 0x1FC[18]
to enable this function after setting the Energy Policy on all threads 
in one package.
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm



[Index of Archives]     [Linux ACPI]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [CPU Freq]     [Kernel Newbies]     [Fedora Kernel]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux