Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- On Feb 18, 2021, at 11:58 AM, Uladzislau Rezki urezki@xxxxxxxxx wrote:

> On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
>> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@xxxxxxxxxx wrote:
>> 
>> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
>> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@xxxxxxxxxxx wrote:
>> >> 
>> >> >> -----Original Message-----
>> >> >> From: neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx
>> >> >> <neeraju=codeaurora.org@xxxxxxxxxxxxxxxxx>
>> >> >> Sent: Thursday, February 18, 2021 3:18 AM
>> >> >> 
>> >> >> Hi Sangmoon,
>> >> >> 
>> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
>> >> >> >> -----Original Message-----
>> >> >> >> From: Paul E. McKenney <paulmck@xxxxxxxxxx>
>> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
>> >> >> >>
>> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
>> >> >> >>> The event allows us to trace the RCU stall when
>> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
>> >> >> >>>
>> >> >> >>> The first parameter is the name of RCU flavour like other trace
>> >> >> >>> events. The second one shows us which function detected stalls.
>> >> >> >>>
>> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
>> >> >> >>> handling or task scheduling or something else. Therefore, this event
>> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
>> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
>> >> >> >>>
>> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@xxxxxxxxxxx>
>> >> >> >>
>> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
>> >> >> >> get the existing information out of the console log?
>> >> >> >>
>> >> >> >> 							Thanx, Paul
>> >> >> >
>> >> >> > This can provide a trigger point for the RCU stall warning.
>> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
>> >> >> > there is a cost of continuing to parse the console log.
>> >> >> > This tracepoint is useful because it is hard to pay these costs
>> >> >> > especially on mobile devices.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Sangmoon
>> >> >> >
>> >> >> 
>> >> >> So, the idea here is to register to these trace events from kernel
>> >> >> module and use that for debugging? Just curious what debugging action
>> >> >> module does on these traces, as they have limited information
>> >> >> about the stall, compared to console stall warnings, which gives a
>> >> >> much more detailed information about stall.
>> >> >> 
>> >> >> 
>> >> >> Thanks
>> >> >> Neeraj
>> >> > 
>> >> > Hi Neeraj,
>> >> > 
>> >> > Yes, a module can log the stall occurence using the trace, although
>> >> > there is no detailed information. If the kernel panic occurs for some
>> >> > reasons, the debugging report generated by the module can include that
>> >> > RCU stall warning has occurred before.
>> >> > 
>> >> > In addition, it's just an idea now, when a trace event happens, the
>> >> > module can store the console log including detailed information, or may
>> >> > also obtain CPU/task information by parsing the console log.
>> >> 
>> >> Adding a new tracepoint is not just about what is extracted by this specific
>> >> tracepoint, but rather how it can be analyzed when combined with all other
>> >> relevant
>> >> tracepoints.
>> >> 
>> >> For instance, if we have this added RCU stall warning added, here is how it can
>> >> be
>> >> used with the upcoming LTTng 2.13, which implements the "event notification"
>> >> triggers
>> >> feature:
>> >> 
>> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
>> >> buffer,
>> >>    enabling the following tracepoints:
>> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
>> >>    ...),
>> >>    - this new RCU stall warning event.
>> >> 
>> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
>> >> sample
>> >>    the kernel stack when the event is hit. This will provide information similar to
>> >>    the stack trace gathered into the console log on OOPS.
>> >> 
>> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
>> >> this
>> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
>> >>    an external user-space process to perform specific actions.
>> >> 
>> >> So you end up with a snapshot containing the sequence of events leading to the
>> >> RCU stall warning, with a kernel stack trace of the context causing the stall
>> >> warning to be emitted.
>> >> 
>> >> I would argue that this information is more complete than just the stack trace
>> >> extracted through the console log.
>> > 
>> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
>> > than a stack trace to the console.  Which is why I am concerned about the
>> > proverbial camel's nose in the tent.  ;-)
>> > 
>> > So Sangmoon, what is it that you really need for this to be useful to you?
>> > 
>> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
>> 
>> Well there is a tracepoint to dump the console's content into the tracing
>> buffers
>> as well, so technically this existing RCU stall warning information could be
>> extracted
>> into a trace as well.
>> 
>> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to
>> hook on a specific
>> event to trigger trace capture, without requiring to parse the console log
>> continuously,
>> and a way to know when the stall warning happens in time within the trace
>> time-line.
>> 
>> That being said, there may be use-cases for extracting more details about the
>> RCU stall
>> as tracepoint event fields to make it more convenient, but it does not appear to
>> be
>> necessary considered that the console can be saved into trace buffers as well.
>> 
> Could you please clarify how the kernel ring-buffer can also be routed over
> trace buffer?
> Probably i am not aware of if it is possible on latest kernel.

This can be done by enabling the "printk console" tracepoint. See the call to
trace_console_rcuidle() in kernel/printk/printk.c. The tracepoint is defined in
include/trace/events/printk.h.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux