A regression has been reported with: commit 3d30544f0212 ("sched/fair: Apply more PELT fixes) when several level of task groups are involved and cpu_possible_mask != cpu_present_mask. The root cause is that group entity's load (tg_child->se[i]->avg.load_avg) is initialized to scale_load_down(se->load.weight). During the creation of a child task group, its group entities on possible CPUs are attached to parent's cfs_rq (tg_parent) and their loads are added in parent's load (tg_parent->load_avg) with update_tg_load_avg. But only the load on online CPUs will be then updated to reflect real load whereas load on other CPUs will stay to the initial value. The result is a tg_parent->load_avg that is higher than the real load, the weight of group entities (tg_parent->se[i]->load.weight) on online CPUs is smaller than it should be, and the task group gets a less running time than what it could expect. This situation can be detected with /proc/sched_debug. The ".tg_load_avg" of the task group will be much higher than sum of ".tg_load_avg_contrib" of online cfs_rqs of the task group. The load of group entities don't have to be intialized to something else than 0 because their load will increase when entity will be attached. Fixes: 3d30544f0212 ("sched/fair: Apply more PELT fixes) Reported-by: Joseph Salisbury <joseph.salisbury@xxxxxxxxxxxxx> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> Tested-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # 4.8.x --- kernel/sched/fair.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8b03fb5..89776ac 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity *se) * will definitely be update (after enqueue). */ sa->period_contrib = 1023; - sa->load_avg = scale_load_down(se->load.weight); + /* + * Tasks are intialized with full load to be seen as heavy task until + * they get a chance to stabilize to their real load level. + * group entity are intialized with null load to reflect the fact that + * nothing has been attached yet to the task group. + */ + if (entity_is_task(se)) + sa->load_avg = scale_load_down(se->load.weight); sa->load_sum = sa->load_avg * LOAD_AVG_MAX; /* * At this point, util_avg won't be used in select_task_rq_fair anyway -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html