The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 30ca4b4598a9d680917fe24df5451afecc028b5b Gitweb: https://git.kernel.org/tip/30ca4b4598a9d680917fe24df5451afecc028b5b Author: Rik van Riel <riel@xxxxxxxxxxx> AuthorDate: Mon, 21 Jun 2021 19:43:30 +02:00 Committer: Ingo Molnar <mingo@xxxxxxxxxx> CommitterDate: Tue, 22 Jun 2021 14:00:53 +02:00 sched/fair: Ensure that the CFS parent is added after unthrottling Ensure that a CFS parent will be in the list whenever one of its children is also in the list. A warning on rq->tmp_alone_branch != &rq->leaf_cfs_rq_list has been reported while running LTP test cfs_bandwidth01. Odin Ugedal found the root cause: $ tree /sys/fs/cgroup/ltp/ -d --charset=ascii /sys/fs/cgroup/ltp/ |-- drain `-- test-6851 `-- level2 |-- level3a | |-- worker1 | `-- worker2 `-- level3b `-- worker3 Timeline (ish): - worker3 gets throttled - level3b is decayed, since it has no more load - level2 get throttled - worker3 get unthrottled - level2 get unthrottled - worker3 is added to list - level3b is not added to list, since nr_running==0 and is decayed [ Vincent Guittot: Rebased and updated to fix for the reported warning. ] Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") Reported-by: Sachin Sant <sachinp@xxxxxxxxxxxxxxxxxx> Suggested-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> Tested-by: Sachin Sant <sachinp@xxxxxxxxxxxxxxxxxx> Acked-by: Odin Ugedal <odin@xxxxxxx> Link: https://lore.kernel.org/r/20210621174330.11258-1-vincent.guittot@xxxxxxxxxx --- kernel/sched/fair.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfaa6e1..a56f646 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3298,6 +3298,31 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags) #ifdef CONFIG_SMP #ifdef CONFIG_FAIR_GROUP_SCHED +/* + * Because list_add_leaf_cfs_rq always places a child cfs_rq on the list + * immediately before a parent cfs_rq, and cfs_rqs are removed from the list + * bottom-up, we only have to test whether the cfs_rq before us on the list + * is our child. + * If cfs_rq is not on the list, test whether a child needs its to be added to + * connect a branch to the tree * (see list_add_leaf_cfs_rq() for details). + */ +static inline bool child_cfs_rq_on_list(struct cfs_rq *cfs_rq) +{ + struct cfs_rq *prev_cfs_rq; + struct list_head *prev; + + if (cfs_rq->on_list) { + prev = cfs_rq->leaf_cfs_rq_list.prev; + } else { + struct rq *rq = rq_of(cfs_rq); + + prev = rq->tmp_alone_branch; + } + + prev_cfs_rq = container_of(prev, struct cfs_rq, leaf_cfs_rq_list); + + return (prev_cfs_rq->tg->parent == cfs_rq->tg); +} static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) { @@ -3334,6 +3359,9 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq) { long delta = cfs_rq->avg.load_avg - cfs_rq->tg_load_avg_contrib; + if (child_cfs_rq_on_list(cfs_rq)) + return false; + /* * No need to update load_avg for root_task_group as it is not used. */