On Tue, Mar 02, 2021 at 10:34:26PM +0530, Neeraj Upadhyay wrote: > On 3/2/2021 5:25 PM, Sangmoon Kim wrote: > > The event allows us to trace the RCU stall when > > sysctl_panic_on_rcu_stall is disabled. > > > > The first parameter is the name of RCU flavour like other trace > > events. The second one shows us which function detected stalls. > > > > The RCU stall is mainly caused by external factors such as interrupt > > handling or task scheduling or something else. Therefore, this event > > uses TRACE_EVENT macro, not dedicated one, so that someone interested > > in the RCU stall can use it without CONFIG_RCU_TRACE. > > > > Signed-off-by: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> [ . . . ] > Reviewed-by: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> Thank you all! As usual, I wordsmithed the commit log as shown below. Please let me know if I messed anything up. Thanx, Paul ------------------------------------------------------------------------ commit 4ee0eb7c0cbccaae8e5e3681d852d4e7f50c4378 Author: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx> Date: Tue Mar 2 20:55:15 2021 +0900 rcu/tree: Add a trace event for RCU CPU stall warnings This commit adds a trace event which allows tracing the beginnings of RCU CPU stall warnings on systems where sysctl_panic_on_rcu_stall is disabled. The first parameter is the name of RCU flavor like other trace events. The second parameter indicates whether this is a stall of an expedited grace period, a self-detected stall of a normal grace period, or a stall of a normal grace period detected by some CPU other than the one that is stalled. RCU CPU stall warnings are often caused by external-to-RCU issues, for example, in interrupt handling or task scheduling. Therefore, this event uses TRACE_EVENT, not TRACE_EVENT_RCU, to avoid requiring those interested in tracing RCU CPU stalls to rebuild their kernels with CONFIG_RCU_TRACE=y. Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> Reviewed-by: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> Signed-off-by: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index 5fc2940..c7711e9 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs, __entry->cpu, __entry->qsevent) ); +/* + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor + * and a string identifying which function detected the RCU stall as follows: + * + * "StallDetected": Scheduler-tick detects other CPU's stalls. + * "SelfDetected": Scheduler-tick detects a current CPU's stall. + * "ExpeditedStall": Expedited grace period detects stalls. + */ +TRACE_EVENT(rcu_stall_warning, + + TP_PROTO(const char *rcuname, const char *msg), + + TP_ARGS(rcuname, msg), + + TP_STRUCT__entry( + __field(const char *, rcuname) + __field(const char *, msg) + ), + + TP_fast_assign( + __entry->rcuname = rcuname; + __entry->msg = msg; + ), + + TP_printk("%s %s", + __entry->rcuname, __entry->msg) +); + #endif /* #if defined(CONFIG_TREE_RCU) */ /* diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 6c6ff06..2796084 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -521,6 +521,7 @@ static void synchronize_rcu_expedited_wait(void) if (rcu_stall_is_suppressed()) continue; panic_on_rcu_stall(); + trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall")); pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", rcu_state.name); ndetected = 0; diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 475b261..59b95cc 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -536,6 +536,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) * See Documentation/RCU/stallwarn.rst for info on how to debug * RCU CPU stall warnings. */ + trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected")); pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name); rcu_for_each_leaf_node(rnp) { raw_spin_lock_irqsave_rcu_node(rnp, flags); @@ -606,6 +607,7 @@ static void print_cpu_stall(unsigned long gps) * See Documentation/RCU/stallwarn.rst for info on how to debug * RCU CPU stall warnings. */ + trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected")); pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name); raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags); print_cpu_stall_info(smp_processor_id());