Patch "wq: handle VM suspension in stall detection" has been added to the 4.9-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    wq: handle VM suspension in stall detection

to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     wq-handle-vm-suspension-in-stall-detection.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 7b09bfa6de231691584cedefc29d531497eafc0d
Author: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
Date:   Thu May 20 19:14:22 2021 +0900

    wq: handle VM suspension in stall detection
    
    [ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ]
    
    If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
    once this VCPU resumes it will see the new jiffies value, while it
    may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
    VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
    There is a small chance of misreported WQ stalls in the meantime,
    because new jiffies is time_after() old 'ts + thresh'.
    
    wq_watchdog_timer_fn()
    {
            for_each_pool(pool, pi) {
                    if (time_after(jiffies, ts + thresh)) {
                            pr_emerg("BUG: workqueue lockup - pool");
                    }
            }
    }
    
    Save jiffies at the beginning of this function and use that value
    for stall detection. If VM gets suspended then we continue using
    "old" jiffies value and old WQ touch timestamps. If IRQ at some
    point restarts the stall detection cycle (pvclock_touch_watchdogs())
    then old jiffies will always be before new 'ts + thresh'.
    
    Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
    Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3231088afd73..a410d5827a73 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -49,6 +49,7 @@
 #include <linux/moduleparam.h>
 #include <linux/uaccess.h>
 #include <linux/nmi.h>
+#include <linux/kvm_para.h>
 
 #include "workqueue_internal.h"
 
@@ -5387,6 +5388,7 @@ static void wq_watchdog_timer_fn(unsigned long data)
 {
 	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
 	bool lockup_detected = false;
+	unsigned long now = jiffies;
 	struct worker_pool *pool;
 	int pi;
 
@@ -5401,6 +5403,12 @@ static void wq_watchdog_timer_fn(unsigned long data)
 		if (list_empty(&pool->worklist))
 			continue;
 
+		/*
+		 * If a virtual machine is stopped by the host it can look to
+		 * the watchdog like a stall.
+		 */
+		kvm_check_and_clear_guest_paused();
+
 		/* get the latest of pool and touched timestamps */
 		pool_ts = READ_ONCE(pool->watchdog_ts);
 		touched = READ_ONCE(wq_watchdog_touched);
@@ -5419,12 +5427,12 @@ static void wq_watchdog_timer_fn(unsigned long data)
 		}
 
 		/* did we stall? */
-		if (time_after(jiffies, ts + thresh)) {
+		if (time_after(now, ts + thresh)) {
 			lockup_detected = true;
 			pr_emerg("BUG: workqueue lockup - pool");
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
-				jiffies_to_msecs(jiffies - pool_ts) / 1000);
+				jiffies_to_msecs(now - pool_ts) / 1000);
 		}
 	}
 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux