Patch "sched/core: Disable page allocation in task_tick_mm_cid()" has been added to the 6.6-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    sched/core: Disable page allocation in task_tick_mm_cid()

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-core-disable-page-allocation-in-task_tick_mm_c.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 3c2218fd9d59b14e7a365fa96c3b22b6f1525d7b
Author: Waiman Long <longman@xxxxxxxxxx>
Date:   Wed Oct 9 21:44:32 2024 -0400

    sched/core: Disable page allocation in task_tick_mm_cid()
    
    [ Upstream commit 73ab05aa46b02d96509cb029a8d04fca7bbde8c7 ]
    
    With KASAN and PREEMPT_RT enabled, calling task_work_add() in
    task_tick_mm_cid() may cause the following splat.
    
    [   63.696416] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
    [   63.696416] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 610, name: modprobe
    [   63.696416] preempt_count: 10001, expected: 0
    [   63.696416] RCU nest depth: 1, expected: 1
    
    This problem is caused by the following call trace.
    
      sched_tick() [ acquire rq->__lock ]
       -> task_tick_mm_cid()
        -> task_work_add()
         -> __kasan_record_aux_stack()
          -> kasan_save_stack()
           -> stack_depot_save_flags()
            -> alloc_pages_mpol_noprof()
             -> __alloc_pages_noprof()
              -> get_page_from_freelist()
               -> rmqueue()
                -> rmqueue_pcplist()
                 -> __rmqueue_pcplist()
                  -> rmqueue_bulk()
                   -> rt_spin_lock()
    
    The rq lock is a raw_spinlock_t. We can't sleep while holding
    it. IOW, we can't call alloc_pages() in stack_depot_save_flags().
    
    The task_tick_mm_cid() function with its task_work_add() call was
    introduced by commit 223baf9d17f2 ("sched: Fix performance regression
    introduced by mm_cid") in v6.4 kernel.
    
    Fortunately, there is a kasan_record_aux_stack_noalloc() variant that
    calls stack_depot_save_flags() while not allowing it to allocate
    new pages.  To allow task_tick_mm_cid() to use task_work without
    page allocation, a new TWAF_NO_ALLOC flag is added to enable calling
    kasan_record_aux_stack_noalloc() instead of kasan_record_aux_stack()
    if set. The task_tick_mm_cid() function is modified to add this new flag.
    
    The possible downside is the missing stack trace in a KASAN report due
    to new page allocation required when task_work_add_noallloc() is called
    which should be rare.
    
    Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
    Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Link: https://lkml.kernel.org/r/20241010014432.194742-1-longman@xxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index cf5e7e891a776..2964171856e00 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -14,11 +14,14 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 }
 
 enum task_work_notify_mode {
-	TWA_NONE,
+	TWA_NONE = 0,
 	TWA_RESUME,
 	TWA_SIGNAL,
 	TWA_SIGNAL_NO_IPI,
 	TWA_NMI_CURRENT,
+
+	TWA_FLAGS = 0xff00,
+	TWAF_NO_ALLOC = 0x0100,
 };
 
 static inline bool task_work_pending(struct task_struct *task)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b406d9886541..b6f922a20f83a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -12050,7 +12050,9 @@ void task_tick_mm_cid(struct rq *rq, struct task_struct *curr)
 		return;
 	if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan)))
 		return;
-	task_work_add(curr, work, TWA_RESUME);
+
+	/* No page allocation under rq lock */
+	task_work_add(curr, work, TWA_RESUME | TWAF_NO_ALLOC);
 }
 
 void sched_mm_cid_exit_signals(struct task_struct *t)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 5c2daa7ad3f90..8aa43204cb7dd 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -53,13 +53,24 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		  enum task_work_notify_mode notify)
 {
 	struct callback_head *head;
+	int flags = notify & TWA_FLAGS;
 
+	notify &= ~TWA_FLAGS;
 	if (notify == TWA_NMI_CURRENT) {
 		if (WARN_ON_ONCE(task != current))
 			return -EINVAL;
 	} else {
-		/* record the work call stack in order to print it in KASAN reports */
-		kasan_record_aux_stack(work);
+		/*
+		 * Record the work call stack in order to print it in KASAN
+		 * reports.
+		 *
+		 * Note that stack allocation can fail if TWAF_NO_ALLOC flag
+		 * is set and new page is needed to expand the stack buffer.
+		 */
+		if (flags & TWAF_NO_ALLOC)
+			kasan_record_aux_stack_noalloc(work);
+		else
+			kasan_record_aux_stack(work);
 	}
 
 	head = READ_ONCE(task->task_works);




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux