This is a note to let you know that I've just added the patch titled sched/fair: Fix value reported by hot tasks pulled in /proc/schedstat to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: sched-fair-fix-value-reported-by-hot-tasks-pulled-in.patch and it can be found in the queue-5.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. commit eeae6ea6e6cb681262cacd62a7ac826e23801b49 Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Date: Fri Dec 20 06:32:19 2024 +0000 sched/fair: Fix value reported by hot tasks pulled in /proc/schedstat [ Upstream commit a430d99e349026d53e2557b7b22bd2ebd61fe12a ] In /proc/schedstat, lb_hot_gained reports the number hot tasks pulled during load balance. This value is incremented in can_migrate_task() if the task is migratable and hot. After incrementing the value, load balancer can still decide not to migrate this task leading to wrong accounting. Fix this by incrementing stats when hot tasks are detached. This issue only exists in detach_tasks() where we can decide to not migrate hot task even if it is migratable. However, in detach_one_task(), we migrate it unconditionally. [Swapnil: Handled the case where nr_failed_migrations_hot was not accounted properly and wrote commit log] Fixes: d31980846f96 ("sched: Move up affinity check to mitigate useless redoing overhead") Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Reported-by: "Gautham R. Shenoy" <gautham.shenoy@xxxxxxx> Not-yet-signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@xxxxxxx> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Link: https://lore.kernel.org/r/20241220063224.17767-2-swapnil.sapkal@xxxxxxx Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> diff --git a/include/linux/sched.h b/include/linux/sched.h index 875f3d317b9c8..5d0a44e4db4b5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -874,6 +874,7 @@ struct task_struct { unsigned sched_reset_on_fork:1; unsigned sched_contributes_to_load:1; unsigned sched_migrated:1; + unsigned sched_task_hot:1; /* Force alignment to the next boundary: */ unsigned :0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4056330d38887..20044e7506ae1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8022,6 +8022,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) int tsk_cache_hot; lockdep_assert_rq_held(env->src_rq); + if (p->sched_task_hot) + p->sched_task_hot = 0; /* * We do not migrate tasks that are: @@ -8094,10 +8096,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) if (tsk_cache_hot <= 0 || env->sd->nr_balance_failed > env->sd->cache_nice_tries) { - if (tsk_cache_hot == 1) { - schedstat_inc(env->sd->lb_hot_gained[env->idle]); - schedstat_inc(p->stats.nr_forced_migrations); - } + if (tsk_cache_hot == 1) + p->sched_task_hot = 1; return 1; } @@ -8112,6 +8112,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env) { lockdep_assert_rq_held(env->src_rq); + if (p->sched_task_hot) { + p->sched_task_hot = 0; + schedstat_inc(env->sd->lb_hot_gained[env->idle]); + schedstat_inc(p->stats.nr_forced_migrations); + } + deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK); set_task_cpu(p, env->dst_cpu); } @@ -8274,6 +8280,9 @@ static int detach_tasks(struct lb_env *env) continue; next: + if (p->sched_task_hot) + schedstat_inc(p->stats.nr_failed_migrations_hot); + list_move(&p->se.group_node, tasks); }