Hello, This patchset extends cgroup v2 to support rgroup (resource group) for in-process hierarchical resource control and implements PRIO_RGRP for setpriority(2) on top to allow in-process hierarchical CPU cycle control in a seamless way. cgroup v1 allowed putting threads of a process in different cgroups which enabled ad-hoc in-process resource control of some resources. Unfortunately, this approach was fraught with problems such as membership ambiguity with per-process resources and lack of isolation between system management and in-process properties. For a more detailed discussion on the subject, please refer to the following message. [1] [RFD] cgroup: thread granularity support for cpu controller This patchset implements the mechanism outlined in the above message. The new mechanism is named rgroup (resource group). When explicitly designating a non-rgroup cgroup, the term sgroup (system group) is used. rgroup has the following properties. * A rgroup is a cgroup which is invisible on and transparent to the system-level cgroupfs interface. * A rgroup can be created by specifying CLONE_NEWRGRP flag, along with CLONE_THREAD, during clone(2). A new rgroup is created under the parent thread's cgroup and the new thread is created in it. * A rgroup is automatically destroyed when empty. * A top-level rgroup of a process is a rgroup whose parent cgroup is a sgroup. A process may have multiple top-level rgroups and thus multiple rgroup subtrees under the same parent sgroup. * Unlike sgroups, rgroups are allowed to compete against peer threads. Each rgroup behaves equivalent to a sibling task. * rgroup subtrees are local to the process. When the process forks or execs, its rgroup subtrees are collapsed. * When a process is migrated to a different cgroup, its rgroup subtrees are preserved. * Subset of controllers available on the parent sgroup are available to rgroup subtrees. Controller management on rgroups is automatic and implicit and doesn't interfere with system-level cgroup controller management. If a controller is made unavailable on the parent sgroup, it's automatically disabled from child rgroup subtrees. rgroup lays the foundation for other kernel mechanisms to make use of resource controllers while providing proper isolation between system management and in-process operations removing the awkward and layer-violating requirement for coordination between individual applications and system management. On top of the rgroup mechanism, PRIO_RGRP is implemented for {set|get}priority(2). * PRIO_RGRP can only be used if the target task is already in a rgroup. If setpriority(2) is used and cpu controller is available, cpu controller is enabled until the target rgroup is covered and the specified nice value is set as the weight of the rgroup. * The specified nice value has the same meaning as for tasks. For example, a rgroup and a task competing under the same parent would behave exactly the same as two tasks. * For top-level rgroups, PRIO_RGRP follows the same rlimit restrictions as PRIO_PROCESS; however, as nested rgroups only distribute CPU cycles which are allocated to the process, no restriction is applied. PRIO_RGRP allows in-process hierarchical control of CPU cycles in a manner which is a straight-forward and minimal extension of existing task and priority management. There are still some missing pieces. * Documentation updates. * A mechanism that applications can use to publish certain rgroups so that external entities can determine which IDs to use to change rgroup settings. I already have interface and implementation design mostly pinned down. * Userland updates such as integrating CLONE_NEWRGRP handling to pthread or updating renice(1) to handle resource groups. I'll attach a test program which demonstrates PRIO_RGRP usage in a follow up email. This patchset contains the following 10 patches. 0001-cgroup-introduce-cgroup_-un-lock.patch 0002-cgroup-un-inline-cgroup_path-and-friends.patch 0003-cgroup-introduce-CGRP_MIGRATE_-flags.patch 0004-signal-make-put_signal_struct-public.patch 0005-cgroup-fork-add-new_rgrp_cset-p-and-clone_flags-to-c.patch 0006-cgroup-fork-add-child-and-clone_flags-to-threadgroup.patch 0007-cgroup-introduce-resource-group.patch 0008-cgroup-implement-rgroup-control-mask-handling.patch 0009-cgroup-implement-rgroup-subtree-migration.patch 0010-cgroup-sched-implement-PRIO_RGRP-for-set-get-priorit.patch 0001-0006 are prepatory patches. 0007-0009 implemnet rgroup support. 0010 implements PRIO_RGRP. This patchset is on top of cgroup/for-4.6 f6d635ad341d ("cgroup: implement cgroup_subsys->implicit_on_dfl") + [2] [PATCH 2/2] cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy + [3] [PATCHSET REPOST] sched, cgroup: implement cgroup v2 interface for cpu controller and available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup2-rgroup diffstat follows. fs/exec.c | 8 include/linux/cgroup-defs.h | 72 ++- include/linux/cgroup.h | 60 +-- include/linux/sched.h | 31 + include/uapi/linux/resource.h | 1 include/uapi/linux/sched.h | 1 kernel/cgroup.c | 828 ++++++++++++++++++++++++++++++++++++++---- kernel/fork.c | 27 - kernel/sched/core.c | 32 + kernel/signal.c | 6 kernel/sys.c | 11 11 files changed, 917 insertions(+), 160 deletions(-) Thanks. -- tejun [1] http://lkml.kernel.org/g/20160105154503.GC5995@xxxxxxxxxxxxxxx [2] http://lkml.kernel.org/g/1456351975-1899-3-git-send-email-tj@xxxxxxxxxx [3] http://lkml.kernel.org/g/20160105164758.GD5995@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html