Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sat, 5 Nov 2022 13:32:20 -0700

On Sat, Nov 05, 2022 at 03:03:14PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/11/5 9:58, Elliott, Robert (Servers) wrote:

[ . . . ]

> >> +int rcu_cpu_stall_cputime __read_mostly =
> >> IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);
> > 
> > As a config option and module parameter, adding some more
> > instrumentation overhead might be worthwhile for other
> > likely causes of rcu stalls.
> > 
> > For example, if enabled, have these functions (if available
> > on the architecture) maintain a per-CPU running count of
> > their invocations, which also cause the CPU to be unavailable
> > for rcu: 
> > - kernel_fpu_begin() calls - FPU/SIMD context preservation,
> >   which also calls preempt_disable()
> > - preempt_disable() calls - scheduler context switches disabled
> > - local_irq_save() calls - interrupts disabled
> > - cond_resched() calls - lack of these is a problem
> > 
> > For kernel_fpu_begin and preempt_disable, knowing if it is
> > currently blocked for those reasons is probably the most
> > helpful.
> 
> These instructions is already in Documentation/RCU/stallwarn.rst

Excellent point -- this document also needs to be updated with this
new information.  I have pulled in your four patches as noted in my
previous email.  They are on the -rcu tree's "dev" branch.

Could you please send a patch containing an initial update to
stallwarn.rst?  The main thing I need is your perspective on how each
field is used.

							Thanx, Paul