Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx <neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx>
> Sent: Thursday, February 18, 2021 3:18 AM
> 
> Hi Sangmoon,
> 
> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> -----Original Message-----
> >> From: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >>
> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >>> The event allows us to trace the RCU stall when
> >>> sysctl_panic_on_rcu_stall is disabled.
> >>>
> >>> The first parameter is the name of RCU flavour like other trace
> >>> events. The second one shows us which function detected stalls.
> >>>
> >>> The RCU stall is mainly caused by external factors such as interrupt
> >>> handling or task scheduling or something else. Therefore, this event
> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >>>
> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx>
> >>
> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> get the existing information out of the console log?
> >>
> >> 							Thanx, Paul
> >
> > This can provide a trigger point for the RCU stall warning.
> > If a module in the kernel wants to trace the stall for debugging purposes,
> > there is a cost of continuing to parse the console log.
> > This tracepoint is useful because it is hard to pay these costs
> > especially on mobile devices.
> >
> > Thanks,
> > Sangmoon
> >
> 
> So, the idea here is to register to these trace events from kernel
> module and use that for debugging? Just curious what debugging action
> module does on these traces, as they have limited information
> about the stall, compared to console stall warnings, which gives a
> much more detailed information about stall.
> 
> 
> Thanks
> Neeraj

Hi Neeraj,

Yes, a module can log the stall occurence using the trace, although
there is no detailed information. If the kernel panic occurs for some
reasons, the debugging report generated by the module can include that
RCU stall warning has occurred before.

In addition, it's just an idea now, when a trace event happens, the
module can store the console log including detailed information, or may
also obtain CPU/task information by parsing the console log.

Thanks,
Sangmoon

> 
> >>
> >>> ---
> >>>   include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
> >>>   kernel/rcu/tree_exp.h      |  2 ++
> >>>   kernel/rcu/tree_stall.h    |  2 ++
> >>>   3 files changed, 32 insertions(+)
> >>>
> >>> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> >>> index 155b5cb43cfd..8476f3161bd0 100644
> >>> --- a/include/trace/events/rcu.h
> >>> +++ b/include/trace/events/rcu.h
> >>> @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
> >>>   		  __entry->cpu, __entry->qsevent)
> >>>   );
> >>>
> >>> +/*
> >>> + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
> >>> + * and a string identifying which function detected the RCU stall as follows:
> >>> + *
> >>> + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
> >>> + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
> >>> + *	"ExpeditedStall": Expedited grace period detects stalls.
> >>> + */
> >>> +TRACE_EVENT(rcu_stall_warning,
> >>> +
> >>> +	TP_PROTO(const char *rcuname, const char *msg),
> >>> +
> >>> +	TP_ARGS(rcuname, msg),
> >>> +
> >>> +	TP_STRUCT__entry(
> >>> +		__field(const char *, rcuname)
> >>> +		__field(const char *, msg)
> >>> +	),
> >>> +
> >>> +	TP_fast_assign(
> >>> +		__entry->rcuname = rcuname;
> >>> +		__entry->msg = msg;
> >>> +	),
> >>> +
> >>> +	TP_printk("%s %s",
> >>> +		  __entry->rcuname, __entry->msg)
> >>> +);
> >>> +
> >>>   #endif /* #if defined(CONFIG_TREE_RCU) */
> >>>
> >>>   /*
> >>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> >>> index 8760b6ead770..c16618284cb2 100644
> >>> --- a/kernel/rcu/tree_exp.h
> >>> +++ b/kernel/rcu/tree_exp.h
> >>> @@ -566,6 +566,8 @@ static void synchronize_rcu_expedited_wait(void)
> >>>   				dump_cpu_task(cpu);
> >>>   			}
> >>>   		}
> >>> +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
> >>> +
> >>>   		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
> >>>   	}
> >>>   }
> >>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> >>> index 70d48c52fabc..e93df4fac5b1 100644
> >>> --- a/kernel/rcu/tree_stall.h
> >>> +++ b/kernel/rcu/tree_stall.h
> >>> @@ -531,6 +531,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >>>
> >>>   	rcu_check_gp_kthread_starvation();
> >>>
> >>> +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
> >>>   	panic_on_rcu_stall();
> >>>
> >>>   	rcu_force_quiescent_state();  /* Kick them all. */
> >>> @@ -575,6 +576,7 @@ static void print_cpu_stall(unsigned long gps)
> >>>   			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >>>   	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >>>
> >>> +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
> >>>   	panic_on_rcu_stall();
> >>>
> >>>   	/*
> >>> --
> >>> 2.17.1
> >>>
> >
> 
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux