Re: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

"Leizhen (ThunderTown)" <thunder.leizhen@xxxxxxxxxx> · Mon, 7 Nov 2022 11:20:46 +0800

On 2022/11/6 4:32, Paul E. McKenney wrote:
> On Sat, Nov 05, 2022 at 03:03:14PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/11/5 9:58, Elliott, Robert (Servers) wrote:
> 
> [ . . . ]
> 
>>>> +int rcu_cpu_stall_cputime __read_mostly =
>>>> IS_ENABLED(CONFIG_RCU_CPU_STALL_CPUTIME);
>>>
>>> As a config option and module parameter, adding some more
>>> instrumentation overhead might be worthwhile for other
>>> likely causes of rcu stalls.
>>>
>>> For example, if enabled, have these functions (if available
>>> on the architecture) maintain a per-CPU running count of
>>> their invocations, which also cause the CPU to be unavailable
>>> for rcu: 
>>> - kernel_fpu_begin() calls - FPU/SIMD context preservation,
>>>   which also calls preempt_disable()
>>> - preempt_disable() calls - scheduler context switches disabled
>>> - local_irq_save() calls - interrupts disabled
>>> - cond_resched() calls - lack of these is a problem
>>>
>>> For kernel_fpu_begin and preempt_disable, knowing if it is
>>> currently blocked for those reasons is probably the most
>>> helpful.
>>
>> These instructions is already in Documentation/RCU/stallwarn.rst
> 
> Excellent point -- this document also needs to be updated with this
> new information.  I have pulled in your four patches as noted in my
> previous email.  They are on the -rcu tree's "dev" branch.

OK, thanks.

> 
> Could you please send a patch containing an initial update to
> stallwarn.rst?  The main thing I need is your perspective on how each
> field is used.

Okay, I'll add some descriptions to illustrate how to use this function
to identify each RCU stall cases.

> 
> 							Thanx, Paul
> .
> 

-- 
Regards,
  Zhen Lei