On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote: > ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@xxxxxxxxxxx wrote: > > >> -----Original Message----- > >> From: neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx > >> <neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx> > >> Sent: Thursday, February 18, 2021 3:18 AM > >> > >> Hi Sangmoon, > >> > >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote: > >> >> -----Original Message----- > >> >> From: Paul E. McKenney <paulmck@xxxxxxxxxx> > >> >> Sent: Wednesday, February 17, 2021 2:50 AM > >> >> > >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote: > >> >>> The event allows us to trace the RCU stall when > >> >>> sysctl_panic_on_rcu_stall is disabled. > >> >>> > >> >>> The first parameter is the name of RCU flavour like other trace > >> >>> events. The second one shows us which function detected stalls. > >> >>> > >> >>> The RCU stall is mainly caused by external factors such as interrupt > >> >>> handling or task scheduling or something else. Therefore, this event > >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested > >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE. > >> >>> > >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx> > >> >> > >> >> The patch looks plausible, but I have to ask... Why not instead just > >> >> get the existing information out of the console log? > >> >> > >> >> Thanx, Paul > >> > > >> > This can provide a trigger point for the RCU stall warning. > >> > If a module in the kernel wants to trace the stall for debugging purposes, > >> > there is a cost of continuing to parse the console log. > >> > This tracepoint is useful because it is hard to pay these costs > >> > especially on mobile devices. > >> > > >> > Thanks, > >> > Sangmoon > >> > > >> > >> So, the idea here is to register to these trace events from kernel > >> module and use that for debugging? Just curious what debugging action > >> module does on these traces, as they have limited information > >> about the stall, compared to console stall warnings, which gives a > >> much more detailed information about stall. > >> > >> > >> Thanks > >> Neeraj > > > > Hi Neeraj, > > > > Yes, a module can log the stall occurence using the trace, although > > there is no detailed information. If the kernel panic occurs for some > > reasons, the debugging report generated by the module can include that > > RCU stall warning has occurred before. > > > > In addition, it's just an idea now, when a trace event happens, the > > module can store the console log including detailed information, or may > > also obtain CPU/task information by parsing the console log. > > Adding a new tracepoint is not just about what is extracted by this specific > tracepoint, but rather how it can be analyzed when combined with all other relevant > tracepoints. > > For instance, if we have this added RCU stall warning added, here is how it can be > used with the upcoming LTTng 2.13, which implements the "event notification" triggers > feature: > > 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring buffer, > enabling the following tracepoints: > - kernel activity (meaning all other RCU event, scheduling, irq, workqueues, ...), > - this new RCU stall warning event. > > 2) Add a "callstack-kernel" context to the RCU stall warning event. This will sample > the kernel stack when the event is hit. This will provide information similar to > the stack trace gathered into the console log on OOPS. > > 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On this > trigger, actions can be associated, such as capturing a snapshot or waking up > an external user-space process to perform specific actions. > > So you end up with a snapshot containing the sequence of events leading to the > RCU stall warning, with a kernel stack trace of the context causing the stall > warning to be emitted. > > I would argue that this information is more complete than just the stack trace > extracted through the console log. I am not so sure about that. RCU CPU stall warnings dump quite a bit more than a stack trace to the console. Which is why I am concerned about the proverbial camel's nose in the tent. ;-) So Sangmoon, what is it that you really need for this to be useful to you? Or am I missing your point? (Either Mathieu's or Sangmoon's.) Thanx, Paul