Patch "psi: Fix race when task wakes up before psi_sched_switch() adjusts flags" has been added to the 6.12-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    psi: Fix race when task wakes up before psi_sched_switch() adjusts flags

to the 6.12-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     psi-fix-race-when-task-wakes-up-before-psi_sched_swi.patch
and it can be found in the queue-6.12 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 4cff6c07ae2da9c9c14d4e8b314efab0b993a292
Author: Chengming Zhou <chengming.zhou@xxxxxxxxx>
Date:   Fri Dec 27 06:19:41 2024 +0000

    psi: Fix race when task wakes up before psi_sched_switch() adjusts flags
    
    [ Upstream commit 7d9da040575b343085287686fa902a5b2d43c7ca ]
    
    When running hackbench in a cgroup with bandwidth throttling enabled,
    following PSI splat was observed:
    
        psi: inconsistent task state! task=1831:hackbench cpu=8 psi_flags=14 clear=0 set=4
    
    When investigating the series of events leading up to the splat,
    following sequence was observed:
    
        [008] d..2.: sched_switch: ... ==> next_comm=hackbench next_pid=1831 next_prio=120
            ...
        [008] dN.2.: dequeue_entity(task delayed): task=hackbench pid=1831 cfs_rq->throttled=0
        [008] dN.2.: pick_task_fair: check_cfs_rq_runtime() throttled cfs_rq on CPU8
        # CPU8 goes into newidle balance and releases the rq lock
            ...
        # CPU15 on same LLC Domain is trying to wakeup hackbench(pid=1831)
        [015] d..4.: psi_flags_change: psi: task state: task=1831:hackbench cpu=8 psi_flags=14 clear=0 set=4 final=14 # Splat (cfs_rq->throttled=1)
        [015] d..4.: sched_wakeup: comm=hackbench pid=1831 prio=120 target_cpu=008 # Task has woken on a throttled hierarchy
        [008] d..2.: sched_switch: prev_comm=hackbench prev_pid=1831 prev_prio=120 prev_state=S ==> ...
    
    psi_dequeue() relies on psi_sched_switch() to set the correct PSI flags
    for the blocked entity, however, with the introduction of DELAY_DEQUEUE,
    the block task can wakeup when newidle balance drops the runqueue lock
    during __schedule().
    
    If a task wakes before psi_sched_switch() adjusts the PSI flags, skip
    any modifications in psi_enqueue() which would still see the flags of a
    running task and not a blocked one. Instead, rely on psi_sched_switch()
    to do the right thing.
    
    Since the status returned by try_to_block_task() may no longer be true
    by the time schedule reaches psi_sched_switch(), check if the task is
    blocked or not using a combination of task_on_rq_queued() and
    p->se.sched_delayed checks.
    
    [ prateek: Commit message, testing, early bailout in psi_enqueue() ]
    
    Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") # 1a6151017ee5
    Signed-off-by: Chengming Zhou <chengming.zhou@xxxxxxxxx>
    Signed-off-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Reviewed-by: Chengming Zhou <chengming.zhou@xxxxxxxxx>
    Link: https://lore.kernel.org/r/20241227061941.2315-1-kprateek.nayak@xxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c1d2d46feec50..aba41c69f09c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6593,7 +6593,6 @@ static void __sched notrace __schedule(int sched_mode)
 	 * as a preemption by schedule_debug() and RCU.
 	 */
 	bool preempt = sched_mode > SM_NONE;
-	bool block = false;
 	unsigned long *switch_count;
 	unsigned long prev_state;
 	struct rq_flags rf;
@@ -6654,7 +6653,7 @@ static void __sched notrace __schedule(int sched_mode)
 			goto picked;
 		}
 	} else if (!preempt && prev_state) {
-		block = try_to_block_task(rq, prev, prev_state);
+		try_to_block_task(rq, prev, prev_state);
 		switch_count = &prev->nvcsw;
 	}
 
@@ -6699,7 +6698,8 @@ static void __sched notrace __schedule(int sched_mode)
 
 		migrate_disable_switch(rq, prev);
 		psi_account_irqtime(rq, prev, next);
-		psi_sched_switch(prev, next, block);
+		psi_sched_switch(prev, next, !task_on_rq_queued(prev) ||
+					     prev->se.sched_delayed);
 
 		trace_sched_switch(preempt, prev, next, prev_state);
 
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 8ee0add5a48a8..6ade91bce63ee 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -138,6 +138,10 @@ static inline void psi_enqueue(struct task_struct *p, int flags)
 	if (flags & ENQUEUE_RESTORE)
 		return;
 
+	/* psi_sched_switch() will handle the flags */
+	if (task_on_cpu(task_rq(p), p))
+		return;
+
 	if (p->se.sched_delayed) {
 		/* CPU migration of "sleeping" task */
 		SCHED_WARN_ON(!(flags & ENQUEUE_MIGRATED));




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux