From: Chris Metcalf <cmetcalf@xxxxxxxxxx> Change the default behavior of watchdog so it only runs on the housekeeping cores when nohz_full is enabled at build and boot time. Allow modifying the set of cores the watchdog is currently running on with a new kernel.watchdog_exclude sysctl. Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx> --- Don, I think this will merge pretty well with the restructuring changes you passed to Andrew. In particular it benefits from that code moving the mutex out to file scope already, and I don't think it conflicts with any of the proposed sysctl renaming or file refactoring. I changed your suggested kernel.watchdog_cpumask to kernel.watchdog_exclude (i.e. the inverse set) since I thought that was clearer in the context of smp_hotplug_thread where cores might potentially go online or offline and the important invariant was that the nohz_full cpuset be respected. What do you think of using my proposed new smp_hotplug_thread exclude_mask to simply prevent unwanted watchdog threads from existing at all? It's cleaner than the "do_exit(0)" strategy, and I think also better than leaving the watchdog threads hanging around - the most common case for nohz_full is likely that "n - 1" cpus would otherwise have kthreads created and never used, and just clutter ps and potentially confuse people trying to understand possible sources of interference to the nohz_full userspace tasks. Documentation/lockup-watchdogs.txt | 6 ++++++ Documentation/sysctl/kernel.txt | 9 +++++++++ include/linux/nmi.h | 3 +++ kernel/sysctl.c | 7 +++++++ kernel/watchdog.c | 36 +++++++++++++++++++++++++++++++++++- 5 files changed, 60 insertions(+), 1 deletion(-) diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt index ab0baa692c13..4f86aec1d69d 100644 --- a/Documentation/lockup-watchdogs.txt +++ b/Documentation/lockup-watchdogs.txt @@ -61,3 +61,9 @@ As explained above, a kernel knob is provided that allows administrators to configure the period of the hrtimer and the perf event. The right value for a particular environment is a trade-off between fast response to lockups and detection overhead. + +By default, the watchdog runs on all online cores. However, on a +kernel configured with NO_HZ_FULL, by default the watchdog runs only +on the housekeeping cores, not the cores specified in the "nohz_full" +boot argument. In either case, the set of cores excluded from running +the watchdog may be adjusted via the kernel.watchdog_exclude sysctl. diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 83ab25660fc9..aad9f9ba347c 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -858,6 +858,15 @@ example. If a system hangs up, try pressing the NMI switch. ============================================================== +watchdog_exclude: + +This value can be used to control on which cpus the watchdog is +prohibited from running. The default exclude mask is empty, but if +NO_HZ_FULL is enabled in the kernel config, and cores are specified +with the nohz_full= boot argument, those cores are excluded by default. + +============================================================== + watchdog_thresh: This value can be used to control the frequency of hrtimer and NMI diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 9b2022ab4d85..1703829c5812 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -70,10 +70,13 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *); u64 hw_nmi_get_sample_period(int watchdog_thresh); extern int watchdog_user_enabled; extern int watchdog_thresh; +extern unsigned long *watchdog_exclude_mask_bits; extern int sysctl_softlockup_all_cpu_backtrace; struct ctl_table; extern int proc_dowatchdog(struct ctl_table *, int , void __user *, size_t *, loff_t *); +extern int proc_dowatchdog_exclude(struct ctl_table *, int, + void __user *, size_t *, loff_t *); #endif #ifdef CONFIG_HAVE_ACPI_APEI_NMI diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 88ea2d6e0031..f2c544181f4f 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -860,6 +860,13 @@ static struct ctl_table kern_table[] = { .extra2 = &sixty, }, { + .procname = "watchdog_exclude", + .data = &watchdog_exclude_mask_bits, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_dowatchdog_exclude, + }, + { .procname = "softlockup_panic", .data = &softlockup_panic, .maxlen = sizeof(int), diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 3174bf8e3538..66bfc80854d1 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -19,6 +19,7 @@ #include <linux/sysctl.h> #include <linux/smpboot.h> #include <linux/sched/rt.h> +#include <linux/tick.h> #include <asm/irq_regs.h> #include <linux/kvm_para.h> @@ -31,6 +32,8 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace; #else #define sysctl_softlockup_all_cpu_backtrace 0 #endif +static cpumask_var_t watchdog_exclude_mask; +unsigned long *watchdog_exclude_mask_bits; static int __read_mostly watchdog_running; static u64 __read_mostly sample_period; @@ -581,6 +584,7 @@ static struct smp_hotplug_thread watchdog_threads = { .cleanup = watchdog_cleanup, .park = watchdog_disable, .unpark = watchdog_enable, + .exclude_mask = watchdog_exclude_mask, }; static void restart_watchdog_hrtimer(void *info) @@ -653,6 +657,8 @@ static void watchdog_disable_all_cpus(void) } } +static DEFINE_MUTEX(watchdog_proc_mutex); + /* * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh */ @@ -662,7 +668,6 @@ int proc_dowatchdog(struct ctl_table *table, int write, { int err, old_thresh, old_enabled; bool old_hardlockup; - static DEFINE_MUTEX(watchdog_proc_mutex); mutex_lock(&watchdog_proc_mutex); old_thresh = ACCESS_ONCE(watchdog_thresh); @@ -700,12 +705,41 @@ out: mutex_unlock(&watchdog_proc_mutex); return err; } + +int proc_dowatchdog_exclude(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int err; + + mutex_lock(&watchdog_proc_mutex); + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); + if (!err && write && watchdog_user_enabled) { + watchdog_disable_all_cpus(); + watchdog_enable_all_cpus(false); + } + mutex_unlock(&watchdog_proc_mutex); + return err; +} + #endif /* CONFIG_SYSCTL */ void __init lockup_detector_init(void) { set_sample_period(); + alloc_bootmem_cpumask_var(&watchdog_exclude_mask); + +#ifdef CONFIG_NO_HZ_FULL + if (!cpumask_empty(tick_nohz_full_mask)) + pr_info("Disabling watchdog on nohz_full cores by default\n"); + cpumask_copy(watchdog_exclude_mask, tick_nohz_full_mask); +#else + cpumask_clear(watchdog_exclude_mask); +#endif + + /* The sysctl API requires a variable holding a pointer to the mask. */ + watchdog_exclude_mask_bits = cpumask_bits(watchdog_exclude_mask); + if (watchdog_user_enabled) watchdog_enable_all_cpus(false); } -- 2.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html