+ kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: kernel/hung_task.c: allow to set checking interval separately from timeout
has been added to the -mm tree.  Its filename is
     kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Subject: kernel/hung_task.c: allow to set checking interval separately from timeout

Currently task hung checking interval is equal to timeout, as the result
hung is detected anywhere between timeout and 2*timeout.  This is fine for
most interactive environments, but this hurts automated testing setups
(syzbot).  In an automated setup we need to strictly order CPU lockup <
RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall
is not detected as task hung and task hung is not detected as silent
machine loss.  The large variance in task hung detection timeout requires
setting silent machine loss timeout to a very large value (e.g.  if task
hung is 3 mins, then silent loss need to be set to ~7 mins).  The
additional 3 minutes significantly reduce testing efficiency because
usually we crash kernel within a minute, and this can add hours to bug
localization process as it needs to do dozens of tests.

Allow setting checking interval separately from timeout.  This allows to
set timeout to, say, 3 minutes, but checking interval to 10 secs.

The interval is controlled via a new hung_task_check_interval_secs sysctl,
similar to the existing hung_task_timeout_secs sysctl.  The default value
of 0 results in the current behavior: checking interval is equal to
timeout.

Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@xxxxxxxxxx
Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 Documentation/sysctl/kernel.txt |   15 ++++++++++++++-
 include/linux/sched.h           |    1 +
 include/linux/sched/sysctl.h    |    1 +
 kernel/fork.c                   |    1 +
 kernel/hung_task.c              |   15 ++++++++++++++-
 kernel/sysctl.c                 |    8 ++++++++
 6 files changed, 39 insertions(+), 2 deletions(-)

diff -puN Documentation/sysctl/kernel.txt~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout Documentation/sysctl/kernel.txt
--- a/Documentation/sysctl/kernel.txt~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout
+++ a/Documentation/sysctl/kernel.txt
@@ -38,6 +38,7 @@ show up in /proc/sys/kernel:
 - hung_task_panic
 - hung_task_check_count
 - hung_task_timeout_secs
+- hung_task_check_interval_secs
 - hung_task_warnings
 - kexec_load_disabled
 - kptr_restrict
@@ -354,7 +355,7 @@ This file shows up if CONFIG_DETECT_HUNG
 
 hung_task_timeout_secs:
 
-Check interval. When a task in D state did not get scheduled
+When a task in D state did not get scheduled
 for more than this value report a warning.
 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
 
@@ -362,6 +363,18 @@ This file shows up if CONFIG_DETECT_HUNG
 Possible values to set are in range {0..LONG_MAX/HZ}.
 
 ==============================================================
+
+hung_task_check_interval_secs:
+
+Hung task check interval. If hung task checking is enabled
+(see hung_task_timeout_secs), the check is done every
+hung_task_check_interval_secs seconds.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0 (default): means use hung_task_timeout_secs as checking interval.
+Possible values to set are in range {0..LONG_MAX/HZ}.
+
+==============================================================
 
 hung_task_warnings:
 
diff -puN include/linux/sched.h~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout include/linux/sched.h
--- a/include/linux/sched.h~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout
+++ a/include/linux/sched.h
@@ -849,6 +849,7 @@ struct task_struct {
 #endif
 #ifdef CONFIG_DETECT_HUNG_TASK
 	unsigned long			last_switch_count;
+	unsigned long			last_switch_time;
 #endif
 	/* Filesystem information: */
 	struct fs_struct		*fs;
diff -puN include/linux/sched/sysctl.h~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout include/linux/sched/sysctl.h
--- a/include/linux/sched/sysctl.h~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout
+++ a/include/linux/sched/sysctl.h
@@ -10,6 +10,7 @@ struct ctl_table;
 extern int	     sysctl_hung_task_check_count;
 extern unsigned int  sysctl_hung_task_panic;
 extern unsigned long sysctl_hung_task_timeout_secs;
+extern unsigned long sysctl_hung_task_check_interval_secs;
 extern int sysctl_hung_task_warnings;
 extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
 					 void __user *buffer,
diff -puN kernel/fork.c~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout kernel/fork.c
--- a/kernel/fork.c~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout
+++ a/kernel/fork.c
@@ -1270,6 +1270,7 @@ static int copy_mm(unsigned long clone_f
 	tsk->nvcsw = tsk->nivcsw = 0;
 #ifdef CONFIG_DETECT_HUNG_TASK
 	tsk->last_switch_count = tsk->nvcsw + tsk->nivcsw;
+	tsk->last_switch_time = 0;
 #endif
 
 	tsk->mm = NULL;
diff -puN kernel/hung_task.c~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout kernel/hung_task.c
--- a/kernel/hung_task.c~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout
+++ a/kernel/hung_task.c
@@ -40,6 +40,11 @@ int __read_mostly sysctl_hung_task_check
  */
 unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
 
+/*
+ * Zero (default value) means use sysctl_hung_task_timeout_secs:
+ */
+unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
+
 int __read_mostly sysctl_hung_task_warnings = 10;
 
 static int __read_mostly did_panic;
@@ -98,8 +103,11 @@ static void check_hung_task(struct task_
 
 	if (switch_count != t->last_switch_count) {
 		t->last_switch_count = switch_count;
+		t->last_switch_time = jiffies;
 		return;
 	}
+	if (time_is_after_jiffies(t->last_switch_time + timeout * HZ))
+		return;
 
 	trace_sched_process_hang(t);
 
@@ -245,8 +253,13 @@ static int watchdog(void *dummy)
 
 	for ( ; ; ) {
 		unsigned long timeout = sysctl_hung_task_timeout_secs;
-		long t = hung_timeout_jiffies(hung_last_checked, timeout);
+		unsigned long interval = sysctl_hung_task_check_interval_secs;
+		long t;
 
+		if (interval == 0)
+			interval = timeout;
+		interval = min_t(unsigned long, interval, timeout);
+		t = hung_timeout_jiffies(hung_last_checked, interval);
 		if (t <= 0) {
 			if (!atomic_xchg(&reset_hung_task, 0))
 				check_hung_uninterruptible_tasks(timeout);
diff -puN kernel/sysctl.c~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout kernel/sysctl.c
--- a/kernel/sysctl.c~kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout
+++ a/kernel/sysctl.c
@@ -1099,6 +1099,14 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &hung_task_timeout_max,
 	},
 	{
+		.procname	= "hung_task_check_interval_secs",
+		.data		= &sysctl_hung_task_check_interval_secs,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_timeout_secs,
+		.extra2		= &hung_task_timeout_max,
+	},
+	{
 		.procname	= "hung_task_warnings",
 		.data		= &sysctl_hung_task_warnings,
 		.maxlen		= sizeof(int),
_

Patches currently in -mm which might be from dvyukov@xxxxxxxxxx are

kernel-hung_taskc-allow-to-set-checking-interval-separately-from-timeout.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux