----- On Feb 18, 2021, at 11:58 AM, Uladzislau Rezki urezki@xxxxxxxxx wrote: > On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote: >> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@xxxxxxxxxx wrote: >> >> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote: >> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@xxxxxxxxxxx wrote: >> >> >> >> >> -----Original Message----- >> >> >> From: neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx >> >> >> <neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx> >> >> >> Sent: Thursday, February 18, 2021 3:18 AM >> >> >> >> >> >> Hi Sangmoon, >> >> >> >> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote: >> >> >> >> -----Original Message----- >> >> >> >> From: Paul E. McKenney <paulmck@xxxxxxxxxx> >> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM >> >> >> >> >> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote: >> >> >> >>> The event allows us to trace the RCU stall when >> >> >> >>> sysctl_panic_on_rcu_stall is disabled. >> >> >> >>> >> >> >> >>> The first parameter is the name of RCU flavour like other trace >> >> >> >>> events. The second one shows us which function detected stalls. >> >> >> >>> >> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt >> >> >> >>> handling or task scheduling or something else. Therefore, this event >> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested >> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE. >> >> >> >>> >> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx> >> >> >> >> >> >> >> >> The patch looks plausible, but I have to ask... Why not instead just >> >> >> >> get the existing information out of the console log? >> >> >> >> >> >> >> >> Thanx, Paul >> >> >> > >> >> >> > This can provide a trigger point for the RCU stall warning. >> >> >> > If a module in the kernel wants to trace the stall for debugging purposes, >> >> >> > there is a cost of continuing to parse the console log. >> >> >> > This tracepoint is useful because it is hard to pay these costs >> >> >> > especially on mobile devices. >> >> >> > >> >> >> > Thanks, >> >> >> > Sangmoon >> >> >> > >> >> >> >> >> >> So, the idea here is to register to these trace events from kernel >> >> >> module and use that for debugging? Just curious what debugging action >> >> >> module does on these traces, as they have limited information >> >> >> about the stall, compared to console stall warnings, which gives a >> >> >> much more detailed information about stall. >> >> >> >> >> >> >> >> >> Thanks >> >> >> Neeraj >> >> > >> >> > Hi Neeraj, >> >> > >> >> > Yes, a module can log the stall occurence using the trace, although >> >> > there is no detailed information. If the kernel panic occurs for some >> >> > reasons, the debugging report generated by the module can include that >> >> > RCU stall warning has occurred before. >> >> > >> >> > In addition, it's just an idea now, when a trace event happens, the >> >> > module can store the console log including detailed information, or may >> >> > also obtain CPU/task information by parsing the console log. >> >> >> >> Adding a new tracepoint is not just about what is extracted by this specific >> >> tracepoint, but rather how it can be analyzed when combined with all other >> >> relevant >> >> tracepoints. >> >> >> >> For instance, if we have this added RCU stall warning added, here is how it can >> >> be >> >> used with the upcoming LTTng 2.13, which implements the "event notification" >> >> triggers >> >> feature: >> >> >> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring >> >> buffer, >> >> enabling the following tracepoints: >> >> - kernel activity (meaning all other RCU event, scheduling, irq, workqueues, >> >> ...), >> >> - this new RCU stall warning event. >> >> >> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will >> >> sample >> >> the kernel stack when the event is hit. This will provide information similar to >> >> the stack trace gathered into the console log on OOPS. >> >> >> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On >> >> this >> >> trigger, actions can be associated, such as capturing a snapshot or waking up >> >> an external user-space process to perform specific actions. >> >> >> >> So you end up with a snapshot containing the sequence of events leading to the >> >> RCU stall warning, with a kernel stack trace of the context causing the stall >> >> warning to be emitted. >> >> >> >> I would argue that this information is more complete than just the stack trace >> >> extracted through the console log. >> > >> > I am not so sure about that. RCU CPU stall warnings dump quite a bit more >> > than a stack trace to the console. Which is why I am concerned about the >> > proverbial camel's nose in the tent. ;-) >> > >> > So Sangmoon, what is it that you really need for this to be useful to you? >> > >> > Or am I missing your point? (Either Mathieu's or Sangmoon's.) >> >> Well there is a tracepoint to dump the console's content into the tracing >> buffers >> as well, so technically this existing RCU stall warning information could be >> extracted >> into a trace as well. >> >> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to >> hook on a specific >> event to trigger trace capture, without requiring to parse the console log >> continuously, >> and a way to know when the stall warning happens in time within the trace >> time-line. >> >> That being said, there may be use-cases for extracting more details about the >> RCU stall >> as tracepoint event fields to make it more convenient, but it does not appear to >> be >> necessary considered that the console can be saved into trace buffers as well. >> > Could you please clarify how the kernel ring-buffer can also be routed over > trace buffer? > Probably i am not aware of if it is possible on latest kernel. This can be done by enabling the "printk console" tracepoint. See the call to trace_console_rcuidle() in kernel/printk/printk.c. The tracepoint is defined in include/trace/events/printk.h. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com