Re: [PATCH v9 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz

Chris Metcalf <cmetcalf@xxxxxxxxxx> · Mon, 27 Apr 2015 16:27:16 -0400

I've been out on vacation the last ten days, but picking this up
again now.

I'll wait a bit before putting out a v10, and also address Uli's additional
emails.  Meanwhile, who is the right person to eventually pick up this patchset
and push it up to Linus?  Frederic, Don, Thomas, akpm?  v9 is here:

https://lkml.org/lkml/2015/4/17/697

And I haven't heard any feedback on my fix to /proc/self/stat etc. to
avoid showing the PARKED threads in "R" state (patch 3/3 from that series).

Thanks for any guidance.


On 04/22/2015 11:21 AM, Don Zickus wrote:
On Wed, Apr 22, 2015 at 07:02:31AM -0400, Ulrich Obergfell wrote:
Chris,

in https://lkml.org/lkml/2015/4/17/616 you stated:

  ">> +	alloc_cpumask_var(&watchdog_cpumask_for_smpboot, GFP_KERNEL);
   >
   > alloc_cpumask_var could fail?

   Good catch; if I get a failure I'll just return early without trying to
   start the watchdog, since clearly things are too memory-constrained
   to enable that functionality anyway."

Let's assume that (in spite of the memory constraints) the kernel would still
be able to make progress and get to a point where the system will be usable.
In this corner case, the following code would leave a NULL pointer behind in
watchdog_cpumask and in watchdog_cpumask_bits which could subsequently lead
to a crash.

  void __init lockup_detector_init(void)
  {
          set_sample_period();
  
+        if (!alloc_cpumask_var(&watchdog_cpumask, GFP_KERNEL)) {
+                pr_err("Failed to allocate cpumask for watchdog");
+                return;
+        }
+        watchdog_cpumask_bits = cpumask_bits(watchdog_cpumask);

For example, proc_watchdog_cpumask() and the change that your patch introduces
in watchdog_enable_all_cpus() are not protected against a possible NULL pointer.
I think the code needs to be made safer.
Or we could just statically allocate it

static DECLARE_BITMAP(watchdog_cpumask, NR_CPUS) __read_mostly;

Cheers,
Don

I think Don's suggestion is best here.  It's too intrusive to try to check
for the out-of-memory condition everywhere in the code, just to guard
against the possibility that a system that is already out of memory while
starting the watchdog still has users trying to fiddle with the
/proc/sys/kernel/watchdog* knobs.

The diff against v9 is just this (plus changing watchdog_cpumask to
&watchdog_cpumask in a bunch of places):

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 8875717b6616..ec742f38c90d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -57,8 +57,8 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace;
 #else
 #define sysctl_softlockup_all_cpu_backtrace 0
 #endif
-static cpumask_var_t watchdog_cpumask;
-unsigned long *watchdog_cpumask_bits;
+static struct cpumask __read_mostly;
+unsigned long *watchdog_cpumask_bits = cpumask_bits(watchdog_cpumask);
 
 /* Helper for online, unparked cpus. */
 #define for_each_watchdog_cpu(cpu) \
@@ -913,12 +913,6 @@ void __init lockup_detector_init(void)
 {
        set_sample_period();
 
-       if (!alloc_cpumask_var(&watchdog_cpumask, GFP_KERNEL)) {
-               pr_err("Failed to allocate cpumask for watchdog");
-               return;
-       }
-       watchdog_cpumask_bits = cpumask_bits(watchdog_cpumask);
-
 #ifdef CONFIG_NO_HZ_FULL
        if (!cpumask_empty(tick_nohz_full_mask))
                pr_info("Disabling watchdog on nohz_full cores by default\n");

That said, presumably we need to schedule a cage match between Frederic and Don
to decide on whether it's best to statically allocate cpumasks or not :-)

https://lkml.org/lkml/2015/4/16/416

My sense is that in this case it's appropriate, since it's much harder to
manage the failure case, whereas in the earlier discussion for
smpboot_update_cpumask_percpu_thread() it made sense to just give up and
return a quick ENOMEM.  Also, in this case we have no locking issues.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html