On Tue, Apr 2, 2019 at 3:43 AM Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote: > > The cgroup CPU bandwidth controller allows to assign a specified > (maximum) bandwidth to the tasks of a group. However this bandwidth is > defined and enforced only on a temporal base, without considering the > actual frequency a CPU is running on. Thus, the amount of computation > completed by a task within an allocated bandwidth can be very different > depending on the actual frequency the CPU is running that task. > The amount of computation can be affected also by the specific CPU a > task is running on, especially when running on asymmetric capacity > systems like Arm's big.LITTLE. > > With the availability of schedutil, the scheduler is now able > to drive frequency selections based on actual task utilization. > Moreover, the utilization clamping support provides a mechanism to > bias the frequency selection operated by schedutil depending on > constraints assigned to the tasks currently RUNNABLE on a CPU. > > Giving the mechanisms described above, it is now possible to extend the > cpu controller to specify the minimum (or maximum) utilization which > should be considered for tasks RUNNABLE on a cpu. > This makes it possible to better defined the actual computational > power assigned to task groups, thus improving the cgroup CPU bandwidth > controller which is currently based just on time constraints. > > Extend the CPU controller with a couple of new attributes util.{min,max} > which allows to enforce utilization boosting and capping for all the > tasks in a group. Specifically: > > - util.min: defines the minimum utilization which should be considered > i.e. the RUNNABLE tasks of this group will run at least at a > minimum frequency which corresponds to the util.min > utilization > > - util.max: defines the maximum utilization which should be considered > i.e. the RUNNABLE tasks of this group will run up to a > maximum frequency which corresponds to the util.max > utilization > > These attributes: > > a) are available only for non-root nodes, both on default and legacy > hierarchies, while system wide clamps are defined by a generic > interface which does not depends on cgroups. This system wide > interface enforces constraints on tasks in the root node. > > b) enforce effective constraints at each level of the hierarchy which > are a restriction of the group requests considering its parent's > effective constraints. Root group effective constraints are defined > by the system wide interface. > This mechanism allows each (non-root) level of the hierarchy to: > - request whatever clamp values it would like to get > - effectively get only up to the maximum amount allowed by its parent > > c) have higher priority than task-specific clamps, defined via > sched_setattr(), thus allowing to control and restrict task requests > > Add two new attributes to the cpu controller to collect "requested" > clamp values. Allow that at each non-root level of the hierarchy. > Validate local consistency by enforcing util.min < util.max. > Keep it simple by do not caring now about "effective" values computation > and propagation along the hierarchy. > > Signed-off-by: Patrick Bellasi <patrick.bellasi@xxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Tejun Heo <tj@xxxxxxxxxx> > > -- > Changes in v8: > Message-ID: <20190214154817.GN50184@xxxxxxxxxxxxxxxxxxxxxxxxxxx> > - update changelog description for points b), c) and following paragraph > --- > Documentation/admin-guide/cgroup-v2.rst | 27 +++++ > init/Kconfig | 22 ++++ > kernel/sched/core.c | 142 +++++++++++++++++++++++- > kernel/sched/sched.h | 6 + > 4 files changed, 196 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 7bf3f129c68b..47710a77f4fa 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -909,6 +909,12 @@ controller implements weight and absolute bandwidth limit models for > normal scheduling policy and absolute bandwidth allocation model for > realtime scheduling policy. > > +Cycles distribution is based, by default, on a temporal base and it > +does not account for the frequency at which tasks are executed. > +The (optional) utilization clamping support allows to enforce a minimum > +bandwidth, which should always be provided by a CPU, and a maximum bandwidth, > +which should never be exceeded by a CPU. > + > WARNING: cgroup2 doesn't yet support control of realtime processes and > the cpu controller can only be enabled when all RT processes are in > the root cgroup. Be aware that system management software may already > @@ -974,6 +980,27 @@ All time durations are in microseconds. > Shows pressure stall information for CPU. See > Documentation/accounting/psi.txt for details. > > + cpu.util.min > + A read-write single value file which exists on non-root cgroups. > + The default is "0", i.e. no utilization boosting. > + > + The requested minimum utilization in the range [0, 1024]. > + > + This interface allows reading and setting minimum utilization clamp > + values similar to the sched_setattr(2). This minimum utilization > + value is used to clamp the task specific minimum utilization clamp. > + > + cpu.util.max > + A read-write single value file which exists on non-root cgroups. > + The default is "1024". i.e. no utilization capping > + > + The requested maximum utilization in the range [0, 1024]. > + > + This interface allows reading and setting maximum utilization clamp > + values similar to the sched_setattr(2). This maximum utilization > + value is used to clamp the task specific maximum utilization clamp. > + > + > > Memory > ------ > diff --git a/init/Kconfig b/init/Kconfig > index 7439cbf4d02e..33006e8de996 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -877,6 +877,28 @@ config RT_GROUP_SCHED > > endif #CGROUP_SCHED > > +config UCLAMP_TASK_GROUP > + bool "Utilization clamping per group of tasks" > + depends on CGROUP_SCHED > + depends on UCLAMP_TASK > + default n > + help > + This feature enables the scheduler to track the clamped utilization > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. > + > + When this option is enabled, the user can specify a min and max > + CPU bandwidth which is allowed for each single task in a group. > + The max bandwidth allows to clamp the maximum frequency a task > + can use, while the min bandwidth allows to define a minimum > + frequency a task will always use. > + > + When task group based utilization clamping is enabled, an eventually > + specified task-specific clamp value is constrained by the cgroup > + specified clamp value. Both minimum and maximum task clamping cannot > + be bigger than the corresponding clamping defined at task group level. > + > + If in doubt, say N. > + > config CGROUP_PIDS > bool "PIDs controller" > help > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 71c9dd6487b1..aeed2dd315cc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1130,8 +1130,12 @@ static void __init init_uclamp(void) > /* System defaults allow max clamp values for both indexes */ > uc_max.value = uclamp_none(UCLAMP_MAX); > uc_max.bucket_id = uclamp_bucket_id(uc_max.value); > - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { > uclamp_default[clamp_id] = uc_max; > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + root_task_group.uclamp_req[clamp_id] = uc_max; > +#endif > + } > } > > #else /* CONFIG_UCLAMP_TASK */ > @@ -6720,6 +6724,19 @@ void ia64_set_curr_task(int cpu, struct task_struct *p) > /* task_group_lock serializes the addition/removal of task groups */ > static DEFINE_SPINLOCK(task_group_lock); > > +static inline int alloc_uclamp_sched_group(struct task_group *tg, > + struct task_group *parent) > +{ > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + int clamp_id; > + > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) > + tg->uclamp_req[clamp_id] = parent->uclamp_req[clamp_id]; > +#endif > + > + return 1; Looks like you never return anything else neither here nor in the following patches I think... > +} > + > static void sched_free_group(struct task_group *tg) > { > free_fair_sched_group(tg); > @@ -6743,6 +6760,9 @@ struct task_group *sched_create_group(struct task_group *parent) > if (!alloc_rt_sched_group(tg, parent)) > goto err; > > + if (!alloc_uclamp_sched_group(tg, parent)) > + goto err; > + > return tg; > > err: > @@ -6963,6 +6983,100 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset) > sched_move_task(task); > } > > +#ifdef CONFIG_UCLAMP_TASK_GROUP > +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, > + struct cftype *cftype, u64 min_value) > +{ > + struct task_group *tg; > + int ret = 0; > + > + if (min_value > SCHED_CAPACITY_SCALE) > + return -ERANGE; > + > + rcu_read_lock(); > + > + tg = css_tg(css); > + if (tg == &root_task_group) { > + ret = -EINVAL; > + goto out; > + } > + if (tg->uclamp_req[UCLAMP_MIN].value == min_value) > + goto out; > + if (tg->uclamp_req[UCLAMP_MAX].value < min_value) { > + ret = -EINVAL; > + goto out; > + } > + > + /* Update tg's "requested" clamp value */ > + tg->uclamp_req[UCLAMP_MIN].value = min_value; > + tg->uclamp_req[UCLAMP_MIN].bucket_id = uclamp_bucket_id(min_value); > + > +out: > + rcu_read_unlock(); > + > + return ret; > +} > + > +static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, > + struct cftype *cftype, u64 max_value) > +{ > + struct task_group *tg; > + int ret = 0; > + > + if (max_value > SCHED_CAPACITY_SCALE) > + return -ERANGE; > + > + rcu_read_lock(); > + > + tg = css_tg(css); > + if (tg == &root_task_group) { > + ret = -EINVAL; > + goto out; > + } > + if (tg->uclamp_req[UCLAMP_MAX].value == max_value) > + goto out; > + if (tg->uclamp_req[UCLAMP_MIN].value > max_value) { > + ret = -EINVAL; > + goto out; > + } > + > + /* Update tg's "requested" clamp value */ > + tg->uclamp_req[UCLAMP_MAX].value = max_value; > + tg->uclamp_req[UCLAMP_MAX].bucket_id = uclamp_bucket_id(max_value); > + > +out: > + rcu_read_unlock(); > + > + return ret; > +} > + > +static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, > + enum uclamp_id clamp_id) > +{ > + struct task_group *tg; > + u64 util_clamp; > + > + rcu_read_lock(); > + tg = css_tg(css); > + util_clamp = tg->uclamp_req[clamp_id].value; > + rcu_read_unlock(); > + > + return util_clamp; > +} > + > +static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + return cpu_uclamp_read(css, UCLAMP_MIN); > +} > + > +static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + return cpu_uclamp_read(css, UCLAMP_MAX); > +} > +#endif /* CONFIG_UCLAMP_TASK_GROUP */ > + > #ifdef CONFIG_FAIR_GROUP_SCHED > static int cpu_shares_write_u64(struct cgroup_subsys_state *css, > struct cftype *cftype, u64 shareval) > @@ -7300,6 +7414,18 @@ static struct cftype cpu_legacy_files[] = { > .read_u64 = cpu_rt_period_read_uint, > .write_u64 = cpu_rt_period_write_uint, > }, > +#endif > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + { > + .name = "util.min", > + .read_u64 = cpu_util_min_read_u64, > + .write_u64 = cpu_util_min_write_u64, > + }, > + { > + .name = "util.max", > + .read_u64 = cpu_util_max_read_u64, > + .write_u64 = cpu_util_max_write_u64, > + }, > #endif > { } /* Terminate */ > }; > @@ -7467,6 +7593,20 @@ static struct cftype cpu_files[] = { > .seq_show = cpu_max_show, > .write = cpu_max_write, > }, > +#endif > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + { > + .name = "util.min", > + .flags = CFTYPE_NOT_ON_ROOT, > + .read_u64 = cpu_util_min_read_u64, > + .write_u64 = cpu_util_min_write_u64, > + }, > + { > + .name = "util.max", > + .flags = CFTYPE_NOT_ON_ROOT, > + .read_u64 = cpu_util_max_read_u64, > + .write_u64 = cpu_util_max_write_u64, > + }, > #endif > { } /* terminate */ > }; > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 6ae3628248eb..b46b6912beba 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -399,6 +399,12 @@ struct task_group { > #endif > > struct cfs_bandwidth cfs_bandwidth; > + > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + /* Clamp values requested for a task group */ > + struct uclamp_se uclamp_req[UCLAMP_CNT]; > +#endif > + > }; > > #ifdef CONFIG_FAIR_GROUP_SCHED > -- > 2.20.1 >