Patch "workqueue: Improve scalability of workqueue watchdog touch" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    workqueue: Improve scalability of workqueue watchdog touch

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     workqueue-improve-scalability-of-workqueue-watchdog-.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 43adf426f34b9518c40084a3bdafa0adca09de97
Author: Nicholas Piggin <npiggin@xxxxxxxxx>
Date:   Tue Jun 25 21:42:45 2024 +1000

    workqueue: Improve scalability of workqueue watchdog touch
    
    [ Upstream commit 98f887f820c993e05a12e8aa816c80b8661d4c87 ]
    
    On a ~2000 CPU powerpc system, hard lockups have been observed in the
    workqueue code when stop_machine runs (in this case due to CPU hotplug).
    This is due to lots of CPUs spinning in multi_cpu_stop, calling
    touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
    wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
    and that can find itself in the same cacheline as other important
    workqueue data, which slows down operations to the point of lockups.
    
    In the case of the following abridged trace, worker_pool_idr was in
    the hot line, causing the lockups to always appear at idr_find.
    
      watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
      Call Trace:
      get_work_pool
      __queue_work
      call_timer_fn
      run_timer_softirq
      __do_softirq
      do_softirq_own_stack
      irq_exit
      timer_interrupt
      decrementer_common_virt
      * interrupt: 900 (timer) at multi_cpu_stop
      multi_cpu_stop
      cpu_stopper_thread
      smpboot_thread_fn
      kthread
    
    Fix this by having wq_watchdog_touch() only write to the line if the
    last time a touch was recorded exceeds 1/4 of the watchdog threshold.
    
    Reported-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
    Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
    Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
    Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f7975a8f665f..73002260674f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5935,12 +5935,18 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 
 notrace void wq_watchdog_touch(int cpu)
 {
+	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
+	unsigned long touch_ts = READ_ONCE(wq_watchdog_touched);
+	unsigned long now = jiffies;
+
 	if (cpu >= 0)
-		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
+		per_cpu(wq_watchdog_touched_cpu, cpu) = now;
 	else
 		WARN_ONCE(1, "%s should be called with valid CPU", __func__);
 
-	wq_watchdog_touched = jiffies;
+	/* Don't unnecessarily store to global cacheline */
+	if (time_after(now, touch_ts + thresh / 4))
+		WRITE_ONCE(wq_watchdog_touched, jiffies);
 }
 
 static void wq_watchdog_set_thresh(unsigned long thresh)




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux