Patch "sched: Fix data-race in wakeup" has been added to the 5.9-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    sched: Fix data-race in wakeup

to the 5.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-fix-data-race-in-wakeup.patch
and it can be found in the queue-5.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 2788b51d1a5854d26f961fdc117054cdc2c25149
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date:   Tue Nov 17 09:08:41 2020 +0100

    sched: Fix data-race in wakeup
    
    [ Upstream commit f97bb5272d9e95d400d6c8643ebb146b3e3e7842 ]
    
    Mel reported that on some ARM64 platforms loadavg goes bananas and
    Will tracked it down to the following race:
    
      CPU0                                  CPU1
    
      schedule()
        prev->sched_contributes_to_load = X;
        deactivate_task(prev);
    
                                            try_to_wake_up()
                                              if (p->on_rq &&) // false
                                              if (smp_load_acquire(&p->on_cpu) && // true
                                                  ttwu_queue_wakelist())
                                                    p->sched_remote_wakeup = Y;
    
        smp_store_release(prev->on_cpu, 0);
    
    where both p->sched_contributes_to_load and p->sched_remote_wakeup are
    in the same word, and thus the stores X and Y race (and can clobber
    one another's data).
    
    Whereas prior to commit c6e7bd7afaeb ("sched/core: Optimize ttwu()
    spinning on p->on_cpu") the p->on_cpu handoff serialized access to
    p->sched_remote_wakeup (just as it still does with
    p->sched_contributes_to_load) that commit broke that by calling
    ttwu_queue_wakelist() with p->on_cpu != 0.
    
    However, due to
    
      p->XXX = X                    ttwu()
      schedule()                      if (p->on_rq && ...) // false
        smp_mb__after_spinlock()      if (smp_load_acquire(&p->on_cpu) &&
        deactivate_task()                 ttwu_queue_wakelist())
          p->on_rq = 0;                     p->sched_remote_wakeup = Y;
    
    We can be sure any 'current' store is complete and 'current' is
    guaranteed asleep. Therefore we can move p->sched_remote_wakeup into
    the current flags word.
    
    Note: while the observed failure was loadavg accounting gone wrong due
    to ttwu() cobbering p->sched_contributes_to_load, the reverse problem
    is also possible where schedule() clobbers p->sched_remote_wakeup,
    this could result in enqueue_entity() wrecking ->vruntime and causing
    scheduling artifacts.
    
    Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
    Reported-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
    Debugged-by: Will Deacon <will@xxxxxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Link: https://lkml.kernel.org/r/20201117083016.GK3121392@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8bf2295ebee48..12aa57de8eea0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -770,7 +770,6 @@ struct task_struct {
 	unsigned			sched_reset_on_fork:1;
 	unsigned			sched_contributes_to_load:1;
 	unsigned			sched_migrated:1;
-	unsigned			sched_remote_wakeup:1;
 #ifdef CONFIG_PSI
 	unsigned			sched_psi_wake_requeue:1;
 #endif
@@ -780,6 +779,21 @@ struct task_struct {
 
 	/* Unserialized, strictly 'current' */
 
+	/*
+	 * This field must not be in the scheduler word above due to wakelist
+	 * queueing no longer being serialized by p->on_cpu. However:
+	 *
+	 * p->XXX = X;			ttwu()
+	 * schedule()			  if (p->on_rq && ..) // false
+	 *   smp_mb__after_spinlock();	  if (smp_load_acquire(&p->on_cpu) && //true
+	 *   deactivate_task()		      ttwu_queue_wakelist())
+	 *     p->on_rq = 0;			p->sched_remote_wakeup = Y;
+	 *
+	 * guarantees all stores of 'current' are visible before
+	 * ->sched_remote_wakeup gets used, so it can be in this word.
+	 */
+	unsigned			sched_remote_wakeup:1;
+
 	/* Bit to tell LSMs we're in execve(): */
 	unsigned			in_execve:1;
 	unsigned			in_iowait:1;



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux