>----- Original Message ----- >From: "Ingo Molnar" <mingo@xxxxxxxxxx> >To: "Don Zickus" <dzickus@xxxxxxxxxx> >Cc: akpm@xxxxxxxxxxxxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, pbonzini@xxxxxxxxxx, mingo@xxxxxxxxxx, "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, "Ulrich >Obergfell" <uobergfe@xxxxxxxxxx>, "Andrew Jones" <drjones@xxxxxxxxxx> >Sent: Monday, August 18, 2014 11:16:44 AM >Subject: Re: [PATCH 4/5] watchdog: control hard lockup detection default > > > * Don Zickus <dzickus@xxxxxxxxxx> wrote: > >> The running kernel still has the ability to enable/disable at any >> time with /proc/sys/kernel/nmi_watchdog us usual. However even >> when the default has been overridden /proc/sys/kernel/nmi_watchdog >> will initially show '1'. To truly turn it on one must disable/enable >> it, i.e. >> echo 0 > /proc/sys/kernel/nmi_watchdog >> echo 1 > /proc/sys/kernel/nmi_watchdog > > This looks like a bug, why is this so? > > Thanks, > > Ingo This is because the hard lockup detector and the soft lockup detector are enabled and disabled at the same time - there isn't a separate 'knob' for each of them. Both are controlled via the 'watchdog_user_enabled' variable which is 1 by default. lockup_detector_init if (watchdog_user_enabled) watchdog_enable_all_cpus smpboot_register_percpu_thread(&watchdog_threads) At boot time, the above code path lauches a 'watchdog/N' thread for each online CPU. The watchdog_enable() function is executed in the context of these threads, and this attempts to enable the hard lockup detector and the soft lockup detector. [Note: Soft lockup detection is implemented in watchdog_timer_fn().] watchdog_enable hrtimer_init(hrtimer, ...) hrtimer->function = watchdog_timer_fn watchdog_nmi_enable perf_event_create_kernel_counter(..., watchdog_overflow_callback) hrtimer_start(hrtimer, ...) On bare metal systems or in virtual environments where the hypervisor does not emulate a PMU, watchdog_nmi_enable() can fail to allocate and enable a PMU counter. This is reported by a console message: NMI watchdog: disabled (cpu0): hardware events not enabled Hence, we can end up with a situation where the soft lockup detector is enabled and the hard lockup detector is not enabled. However, the output of 'cat /proc/sys/kernel/nmi_watchdog' is 1 because it merely shows the state of the 'watchdog_user_enabled' variable. The above is the behaviour even without the proposed patch. The patch merely adds the following hunk in watchdog_nmi_enable() to 'fake' a -ENOENT error return from perf_event_create_kernel_counter(). + if (!watchdog_hardlockup_detector_is_enabled()) { + event = ERR_PTR(-ENOENT); + goto handle_err; + } The patch does not break the output of 'cat /proc/sys/kernel/nmi_watchdog' since the discrepancy between the output and the actual state of the hard lockup detector is nothing new. Regards, Uli -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html