Re-sending from a proper text-only email client. Sorry for the extra email. On Mon, 4 May 2015, Chris Metcalf wrote: > On 5/2/2015 4:10 AM, john.hubbard@xxxxxxxxx wrote: > > From: John Hubbard <jhubbard@xxxxxxxxxx> > > Commit 8fcf2cc768acd845c1fed837bf9cfe2d7106336d in linux-next > introduced a regression in some configurations. Specifically, > with CONFIG_NO_HZ_FULL set, and CONFIG_NO_HZ_FULL_ALL *not* set, > the kernel will crash in lockup_detector_init(), due to a > NULL tick_nohz_full_init pointer. > > This is because the above commit uses tick_nohz_full_init > (in lockup_detector_init), if CONFIG_NO_HZ_FULL is set, but > tick_nohz_full_init only gets allocated if either: > > a) CONFIG_NO_HZ_FULL_ALL is set, or > > b) Someone passes in nohz_full=<any_value> on the boot > args line. > > To correct this, change the allocation site (case a, above): > allocate tick_nohz_full_init whenever CONFIG_NO_HZ_FULL is set. > Also, change the enclosing function name to more accurately > reflect its current role. > > Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx> > > > (Note that you say "tick_nohz_full_init" in your text a couple of times > where you mean "tick_nohz_full_mask".) Yes, you're correct. I'll fix that. > > This looks plausible to me, but I know that Frederic was thinking of > doing something deeper by making tick_nohz_full_mask a mask > that was available in all configurations, like cpu_possible_mask etc: > > https://lkml.org/lkml/2015/4/14/347 > > Another way to fix this would be to fix lockup_detector_init() to > test dynamically like this: > > if (tick_nohz_full_enabled()) { > if (!cpumask_empty(tick_nohz_full_mask)) > pr_info("Disabling watchdog on nohz_full cores by default\n"); > cpumask_andnot(&watchdog_cpumask, cpu_possible_mask, > tick_nohz_full_mask); > } else { > cpumask_copy(&watchdog_cpumask, cpu_possible_mask); > } > > This would avoid looking at the NULL cpumask pointer and is more > consistent with how other nohz_full code works, for better or worse. > OK, right, I see now that the NO_HZ code usually does this check: #ifdef CONFIG_NO_HZ_FULL if (tick_nohz_full_enabled()) ...and the new code in lockup_detector_init() simply lacked the second line, so your fix is probably the better way to go. I just retested and all is well with that approach. I'll send an updated fix separately (so that whitespace is preserved) in a moment. > I didn't see this since by default I don't have CPUMASK_OFFLINE. > -- > Chris Metcalf, EZChip Semiconductor > http://www.ezchip.com > > thanks, John H.