On Tue, 20 Jun 2017 14:33:09 -0700 kan.liang@xxxxxxxxx wrote: > From: Kan Liang <Kan.liang@xxxxxxxxx> > > Some users reported spurious NMI watchdog timeouts. > > We now have more and more systems where the Turbo range is wide enough > that the NMI watchdog expires faster than the soft watchdog timer that > updates the interrupt tick the NMI watchdog relies on. > > This problem was originally added by commit 58687acba592 > ("lockup_detector: Combine nmi_watchdog and softlockup detector"). > Previously the NMI watchdog would always check jiffies, which were > ticking fast enough. But now the backing is quite slow so the expire > time becomes more sensitive. > > For mainline the right fix is to switch the NMI watchdog to reference > cycles, which tick always at the same rate independent of turbo mode. > But this is requires some complicated changes in perf, which are too > difficult to backport. Since we need a stable fix too just increase the > NMI watchdog rate here to avoid the spurious timeouts. This is not an > ideal fix because a 3x as large Turbo range could still fail, but for > now that's not likely. > > ... > > The right fix for mainline can be found here. > perf/x86/intel: enable CPU ref_cycles for GP counter > perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 > https://patchwork.kernel.org/patch/9779087/ > https://patchwork.kernel.org/patch/9779089/ Presumably the "right fix" will later be altered to revert this one-line workaround?