Patch "sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()" has been added to the 5.10-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()

to the 5.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-core-remove-the-unnecessary-need_resched-check.patch
and it can be found in the queue-5.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit c6729d6eac04e2570b1043dd8322be448456c93a
Author: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Date:   Tue Nov 19 05:44:30 2024 +0000

    sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()
    
    [ Upstream commit ea9cffc0a154124821531991d5afdd7e8b20d7aa ]
    
    The need_resched() check currently in nohz_csd_func() can be tracked
    to have been added in scheduler_ipi() back in 2011 via commit
    ca38062e57e9 ("sched: Use resched IPI to kick off the nohz idle balance")
    
    Since then, it has travelled quite a bit but it seems like an idle_cpu()
    check currently is sufficient to detect the need to bail out from an
    idle load balancing. To justify this removal, consider all the following
    case where an idle load balancing could race with a task wakeup:
    
    o Since commit f3dd3f674555b ("sched: Remove the limitation of WF_ON_CPU
      on wakelist if wakee cpu is idle") a target perceived to be idle
      (target_rq->nr_running == 0) will return true for
      ttwu_queue_cond(target) which will offload the task wakeup to the idle
      target via an IPI.
    
      In all such cases target_rq->ttwu_pending will be set to 1 before
      queuing the wake function.
    
      If an idle load balance races here, following scenarios are possible:
    
      - The CPU is not in TIF_POLLING_NRFLAG mode in which case an actual
        IPI is sent to the CPU to wake it out of idle. If the
        nohz_csd_func() queues before sched_ttwu_pending(), the idle load
        balance will bail out since idle_cpu(target) returns 0 since
        target_rq->ttwu_pending is 1. If the nohz_csd_func() is queued after
        sched_ttwu_pending() it should see rq->nr_running to be non-zero and
        bail out of idle load balancing.
    
      - The CPU is in TIF_POLLING_NRFLAG mode and instead of an actual IPI,
        the sender will simply set TIF_NEED_RESCHED for the target to put it
        out of idle and flush_smp_call_function_queue() in do_idle() will
        execute the call function. Depending on the ordering of the queuing
        of nohz_csd_func() and sched_ttwu_pending(), the idle_cpu() check in
        nohz_csd_func() should either see target_rq->ttwu_pending = 1 or
        target_rq->nr_running to be non-zero if there is a genuine task
        wakeup racing with the idle load balance kick.
    
    o The waker CPU perceives the target CPU to be busy
      (targer_rq->nr_running != 0) but the CPU is in fact going idle and due
      to a series of unfortunate events, the system reaches a case where the
      waker CPU decides to perform the wakeup by itself in ttwu_queue() on
      the target CPU but target is concurrently selected for idle load
      balance (XXX: Can this happen? I'm not sure, but we'll consider the
      mother of all coincidences to estimate the worst case scenario).
    
      ttwu_do_activate() calls enqueue_task() which would increment
      "rq->nr_running" post which it calls wakeup_preempt() which is
      responsible for setting TIF_NEED_RESCHED (via a resched IPI or by
      setting TIF_NEED_RESCHED on a TIF_POLLING_NRFLAG idle CPU) The key
      thing to note in this case is that rq->nr_running is already non-zero
      in case of a wakeup before TIF_NEED_RESCHED is set which would
      lead to idle_cpu() check returning false.
    
    In all cases, it seems that need_resched() check is unnecessary when
    checking for idle_cpu() first since an impending wakeup racing with idle
    load balancer will either set the "rq->ttwu_pending" or indicate a newly
    woken task via "rq->nr_running".
    
    Chasing the reason why this check might have existed in the first place,
    I came across  Peter's suggestion on the fist iteration of Suresh's
    patch from 2011 [1] where the condition to raise the SCHED_SOFTIRQ was:
    
            sched_ttwu_do_pending(list);
    
            if (unlikely((rq->idle == current) &&
                rq->nohz_balance_kick &&
                !need_resched()))
                    raise_softirq_irqoff(SCHED_SOFTIRQ);
    
    Since the condition to raise the SCHED_SOFIRQ was preceded by
    sched_ttwu_do_pending() (which is equivalent of sched_ttwu_pending()) in
    the current upstream kernel, the need_resched() check was necessary to
    catch a newly queued task. Peter suggested modifying it to:
    
            if (idle_cpu() && rq->nohz_balance_kick && !need_resched())
                    raise_softirq_irqoff(SCHED_SOFTIRQ);
    
    where idle_cpu() seems to have replaced "rq->idle == current" check.
    
    Even back then, the idle_cpu() check would have been sufficient to catch
    a new task being enqueued. Since commit b2a02fc43a1f ("smp: Optimize
    send_call_function_single_ipi()") overloads the interpretation of
    TIF_NEED_RESCHED for TIF_POLLING_NRFLAG idling, remove the
    need_resched() check in nohz_csd_func() to raise SCHED_SOFTIRQ based
    on Peter's suggestion.
    
    Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
    Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
    Signed-off-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20241119054432.6405-3-kprateek.nayak@xxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 29d8fc3a7bbd2..8e30041cecf94 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -750,7 +750,7 @@ static void nohz_csd_func(void *info)
 	WARN_ON(!(flags & NOHZ_KICK_MASK));
 
 	rq->idle_balance = idle_cpu(cpu);
-	if (rq->idle_balance && !need_resched()) {
+	if (rq->idle_balance) {
 		rq->nohz_idle_balance = flags;
 		raise_softirq_irqoff(SCHED_SOFTIRQ);
 	}




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux