Patch "sched/uclamp: Ignore max aggregation if rq is idle" has been added to the 5.10-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Mon, 19 Jul 2021 09:36:13 -0400

This is a note to let you know that I've just added the patch titled

    sched/uclamp: Ignore max aggregation if rq is idle

to the 5.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-uclamp-ignore-max-aggregation-if-rq-is-idle.patch
and it can be found in the queue-5.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit d7fea1f86d62798d1f7e87bd44ebff53dc1be441
Author: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
Date:   Wed Jun 30 22:12:04 2021 +0800

    sched/uclamp: Ignore max aggregation if rq is idle
    
    [ Upstream commit 3e1493f46390618ea78607cb30c58fc19e2a5035 ]
    
    When a task wakes up on an idle rq, uclamp_rq_util_with() would max
    aggregate with rq value. But since there is no task enqueued yet, the
    values are stale based on the last task that was running. When the new
    task actually wakes up and enqueued, then the rq uclamp values should
    reflect that of the newly woken up task effective uclamp values.
    
    This is a problem particularly for uclamp_max because it default to
    1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
    would ignore the capping that should apply when this task is enqueued,
    which is wrong.
    
    Fix that by ignoring max aggregation if the rq is idle since in that
    case the effective uclamp value of the rq will be the ones of the task
    that will wake up.
    
    Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
    Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>
    [qias: Changelog]
    Reviewed-by: Qais Yousef <qais.yousef@xxxxxxx>
    Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@xxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fdebfcbdfca9..39112ac7ab34 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2422,20 +2422,27 @@ static __always_inline
 unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
 				  struct task_struct *p)
 {
-	unsigned long min_util;
-	unsigned long max_util;
+	unsigned long min_util = 0;
+	unsigned long max_util = 0;
 
 	if (!static_branch_likely(&sched_uclamp_used))
 		return util;
 
-	min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
-	max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
-
 	if (p) {
-		min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
-		max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
+		min_util = uclamp_eff_value(p, UCLAMP_MIN);
+		max_util = uclamp_eff_value(p, UCLAMP_MAX);
+
+		/*
+		 * Ignore last runnable task's max clamp, as this task will
+		 * reset it. Similarly, no need to read the rq's min clamp.
+		 */
+		if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
+			goto out;
 	}
 
+	min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
+	max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
+out:
 	/*
 	 * Since CPU's {min,max}_util clamps are MAX aggregated considering
 	 * RUNNABLE tasks with _different_ clamps, we can end up with an