The CPU cgroup is so far, undocumented. Although data exists in the Documentation directory about its functioning, it is usually spread, and/or presented in the context of something else. This file consolidates all cgroup-related information about it. Signed-off-by: Glauber Costa <glommer@xxxxxxxxxx> --- Documentation/cgroups/cpu.txt | 81 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 Documentation/cgroups/cpu.txt diff --git a/Documentation/cgroups/cpu.txt b/Documentation/cgroups/cpu.txt new file mode 100644 index 0000000..072fd58 --- /dev/null +++ b/Documentation/cgroups/cpu.txt @@ -0,0 +1,81 @@ +CPU Controller +-------------- + +The CPU controller is responsible for grouping tasks together that will be +viewed by the scheduler as a single unit. The CFS scheduler will first divide +CPU time equally between all entities in the same level, and then proceed by +doing the same in the next level. Basic use cases for that are described in the +main cgroup documentation file, cgroups.txt. + +Users of this functionality should be aware that deep hierarchies will of +course impose scheduler overhead, since the scheduler will have to take extra +steps and look up additional data structures to make its final decision. + +Through the CPU controller, the scheduler is also able to cap the CPU +utilization of a particular group. This is particularly useful in environments +in which CPU is paid for by the hour, and one values predictability over +performance. + +CPU Accounting +-------------- + +The CPU cgroup will also provide additional files under the prefix "cpuacct". +Those files provide accounting statistics and were previously provided by the +separate cpuacct controller. Although the cpuacct controller will still be kept +around for compatibility reasons, its usage is discouraged. If both the CPU and +cpuacct controllers are present in the system, distributors are encouraged to +always mount them together. + +Files +----- + +The CPU controller exposes the following files to the user: + + - cpu.shares: The weight of each group living in the same hierarchy, that + translates into the amount of CPU it is expected to get. Upon cgroup creation, + each group gets assigned a default of 1024. The percentage of CPU assigned to + the cgroup is the value of shares divided by the sum of all shares in all + cgroups in the same level. + + - cpu.cfs_period_us: The duration in microseconds of each scheduler period, for + bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will + improve throughput at the expense of latency, since the scheduler will be able + to sustain a cpu-bound workload for longer. The opposite of true for smaller + periods. Note that this only affects non-RT tasks that are scheduled by the + CFS scheduler. + +- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us + in for the current group will be allowed to run. For instance, if it is set to + half of cpu_period_us, the cgroup will only be able to peak run for 50 % of + the time. One should note that this represents aggregate time over all CPUs + in the system. Therefore, in order to allow full usage of two CPUs, for + instance, one should set this value to twice the value of cfs_period_us. + +- cpu.stat: statistics about the bandwidth controls. No data will be presented + if cpu.cfs_quota_us is not set. The file presents three + numbers: + nr_periods: how many full periods have been elapsed. + nr_throttled: number of times we exausted the full allowed bandwidth + throttled_time: total time the tasks were not run due to being overquota + + - cpu.rt_runtime_us and cpu.rt_period_us: Those files are the RT-tasks + analogous to the CFS files cfs_quota_us and cfs_period_us. One important + difference, though, is that while the cfs quotas are upper bounds that + won't necessarily be met, the rt runtimes form a stricter guarantee. + Therefore, no overlap is allowed. Implications of that are that given a + hierarchy with multiple children, the sum of all rt_runtime_us may not exceed + the runtime of the parent. Also, a rt_runtime_us of 0, means that no rt tasks + can ever be run in this cgroup. For more information about rt tasks runtime + assignments, see scheduler/sched-rt-group.txt + + - cpuacct.usage: The aggregate CPU time, in nanoseconds, consumed by all tasks + in this group. + + - cpuacct.usage_percpu: The CPU time, in nanoseconds, consumed by all tasks in + this group, separated by CPU. The format is an space-separated array of time + values, one for each present CPU. + + - cpuacct.stat: aggregate user and system time consumed by tasks in this group. + The format is + user: x + system: y -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html