Thanks Paul, sounds good! Much appreciated your guidance. Chao eero inc. 660 3rd St, 4th Floor San Francisco, CA 94107 On Fri, Sep 4, 2020 at 1:37 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Fri, Sep 04, 2020 at 12:40:29PM -0700, Chao Zhou wrote: > > Thanks Paul. Appreciated it. > > > > Initial intent was to give users a way to make their system more > > tolerable, but prudent enough to recover if suspicious behavior > > reaches a watermark. If a system experiences multiple stalls in one > > lifetime, no matter how healthy it looks or whether the stalls are > > from different sources, we still want it to dramatically recover. > > Please share your guidance? > > I have no guidance in this case. I was just wanting to verify that the > patch was in fact doing what you want it to. And it sounds like it does, > so good! > > I have queued this for testing and further review. If all goes well, > I would submit it upstream for the v5.11 merge window, that is, not the > upcoming merge window, but the one after that. > > Thanx, Paul > > > eero inc. > > > > 660 3rd St, 4th Floor > > > > San Francisco, CA 94107 > > > > > > > > On Fri, Sep 4, 2020 at 11:05 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Sun, Aug 30, 2020 at 11:41:17PM -0700, chao wrote: > > > > Some stalls are transient and system can fully recover. > > > > Allow users to configure the number of stalls experienced > > > > to trigger kernel Panic. > > > > > > > > Signed-off-by: chao <chao@xxxxxxxx> > > > > > > Hearing no objections, I have queued this with wordsmithing as shown > > > below. Please let me know if I messed something up. > > > > > > One question, though. It looks like setting this to (say) 5 would panic > > > after the fifth RCU CPU stall warning message, regardless whether all > > > five were reporting the same RCU CPU stall event or whether they instead > > > were five widely separated transient RCU CPU stall events, where the > > > system fully recovered from each event. Is this the intent? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > commit e710c928fb52d8e56bc6173515805301da6aa22b > > > Author: chao <chao@xxxxxxxx> > > > Date: Sun Aug 30 23:41:17 2020 -0700 > > > > > > rcu: Panic after fixed number of stalls > > > > > > Some stalls are transient, so that system fully recovers. This commit > > > therefore allows users to configure the number of stalls that must happen > > > in order to trigger kernel panic. > > > > > > Signed-off-by: chao <chao@xxxxxxxx> > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > > > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > > > index 500def6..fc2dd3f 100644 > > > --- a/include/linux/kernel.h > > > +++ b/include/linux/kernel.h > > > @@ -536,6 +536,7 @@ extern int panic_on_warn; > > > extern unsigned long panic_on_taint; > > > extern bool panic_on_taint_nousertaint; > > > extern int sysctl_panic_on_rcu_stall; > > > +extern int sysctl_max_rcu_stall_to_panic; > > > extern int sysctl_panic_on_stackoverflow; > > > > > > extern bool crash_kexec_post_notifiers; > > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > > > index 0fde39b..228c55f 100644 > > > --- a/kernel/rcu/tree_stall.h > > > +++ b/kernel/rcu/tree_stall.h > > > @@ -13,6 +13,7 @@ > > > > > > /* panic() on RCU Stall sysctl. */ > > > int sysctl_panic_on_rcu_stall __read_mostly; > > > +int sysctl_max_rcu_stall_to_panic __read_mostly; > > > > > > #ifdef CONFIG_PROVE_RCU > > > #define RCU_STALL_DELAY_DELTA (5 * HZ) > > > @@ -106,6 +107,11 @@ early_initcall(check_cpu_stall_init); > > > /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */ > > > static void panic_on_rcu_stall(void) > > > { > > > + static int cpu_stall; > > > + > > > + if (++cpu_stall < sysctl_max_rcu_stall_to_panic) > > > + return; > > > + > > > if (sysctl_panic_on_rcu_stall) > > > panic("RCU Stall\n"); > > > } > > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > > > index 287862f..1bca490 100644 > > > --- a/kernel/sysctl.c > > > +++ b/kernel/sysctl.c > > > @@ -2651,6 +2651,17 @@ static struct ctl_table kern_table[] = { > > > .extra2 = SYSCTL_ONE, > > > }, > > > #endif > > > +#if defined(CONFIG_TREE_RCU) > > > + { > > > + .procname = "max_rcu_stall_to_panic", > > > + .data = &sysctl_max_rcu_stall_to_panic, > > > + .maxlen = sizeof(sysctl_max_rcu_stall_to_panic), > > > + .mode = 0644, > > > + .proc_handler = proc_dointvec_minmax, > > > + .extra1 = SYSCTL_ONE, > > > + .extra2 = SYSCTL_INT_MAX, > > > + }, > > > +#endif > > > #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE > > > { > > > .procname = "stack_erasing",