Re: [PATCH v2 10/14] tracing: Document trace_marker triggers

Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> · Mon, 14 May 2018 16:47:07 -0500

Hi Steve,

This is a nice new event feature - thanks for doing it!  Some minor typo
comments below..

On Mon, 2018-05-14 at 16:58 -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx>
> 
> Add documentation and an example on how to use trace_marker triggers.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
> ---
>  Documentation/trace/events.rst    |   6 +-
>  Documentation/trace/ftrace.rst    |   5 +
>  Documentation/trace/histogram.txt | 546 +++++++++++++++++++++++++++++-
>  3 files changed, 555 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
> index a5ea2cb0082b..1afae55dc55c 100644
> --- a/Documentation/trace/events.rst
> +++ b/Documentation/trace/events.rst
> @@ -338,10 +338,14 @@ used for conditionally invoking triggers.
>  
>  The syntax for event triggers is roughly based on the syntax for
>  set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
> -section of Documentation/trace/ftrace.txt), but there are major
> +section of Documentation/trace/ftrace.rst), but there are major
>  differences and the implementation isn't currently tied to it in any
>  way, so beware about making generalizations between the two.
>  
> +Note: Writing into trace_marker (See Documentation/trace/ftrace.rst)
> +     can also enable triggers that are written into
> +     /sys/kernel/tracing/events/ftrace/print/trigger
> +
>  6.1 Expression syntax
>  ---------------------
>  
> diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
> index 67d9c38e95eb..aad4776c0f5d 100644
> --- a/Documentation/trace/ftrace.rst
> +++ b/Documentation/trace/ftrace.rst
> @@ -507,6 +507,11 @@ of ftrace. Here is a list of some of the key files:
>  
>  		trace_fd = open("trace_marker", WR_ONLY);
>  
> +	Note: Writing into the trace_marker file can also initiate triggers
> +	      that are written into /sys/kernel/tracing/events/ftrace/print/trigger
> +	      See "Event triggers" in Documentation/trace/events.rst and an
> +              example in Documentation/trace/histogram.rst (Section 3.)
> +
>    trace_marker_raw:
>  
>  	This is similar to trace_marker above, but is meant for for binary data
> diff --git a/Documentation/trace/histogram.txt b/Documentation/trace/histogram.txt
> index 6e05510afc28..8c871f0c0e33 100644
> --- a/Documentation/trace/histogram.txt
> +++ b/Documentation/trace/histogram.txt
> @@ -1604,7 +1604,6 @@
>          Entries: 7
>          Dropped: 0
>  
> -
>  2.2 Inter-event hist triggers
>  -----------------------------
>  
> @@ -1993,3 +1992,548 @@ hist trigger specification.
>            Hits: 12970
>            Entries: 2
>            Dropped: 0
> +
> +3. User space creating a trigger
> +--------------------------------
> +
> +Writing into /sys/kernel/tracing/trace_marker writes into the ftrace
> +ring buffer. This can also act like an event, by writing into the trigger
> +file located in /sys/kernel/tracing/events/ftrace/print/
> +
> +Modifying cyclictest to write into the trace_marker file before it sleeps
> +and after it wakes up, something like this:
> +
> +static void traceputs(char *str)
> +{
> +	/* tracemark_fd is the trace_marker file descripto */
> 

Should be 'descriptor'?

> 
> +	if (tracemark_fd < 0)
> +		return;
> +	/* write the tracemark message */
> +	write(tracemark_fd, str, strlen(str));
> +}
> +
> +And later add something like:
> +
> +	traceputs("start");
> +	clock_nanosleep(...);
> +	traceputs("end");
> +
> +We can make a histogram from this:
> +
> + # cd /sys/kernel/tracing
> + # echo 'latency u64 lat' > synthetic_events
> + # echo 'hist:keys=common_pid:ts0=common_timestamp.usecs if buf == "start"' > events/ftrace/print/trigger
> + # echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) if buf == "end"' >> events/ftrace/print/trigger
> + # echo 'hist:keys=lat,common_pid:sort=lat' > events/synthetic/latency/trigger
> +
> +The above created a synthetic event called "latency" and two histograms
> +against the trace_marker, one gets triggered when "start" is written into the
> +trace_marker file and the other when "end" is written. If the pids match, then
> +it will call the "latency" synthetic event with the calculated latency as its
> +parameter. Finally, a histogram is added to the latency synthetic event to
> +record the calculated latency along with the pid.
> +
> +Now running cyclictest with:
> +
> + # ./cyclictest -p80 -d0 -i250 -n -a -t --tracemark -b 1000
> +
> + -p80  : run threads at priority 80
> + -d0   : have all threads run at the same interval
> + -i250 : start the interval at 250 microseconds (all threads will do this)
> + -n    : sleep with nanosleep
> + -a    : affine all threads to a separate CPU
> + -t    : one thread per available CPU
> + --tracemark : enable trace mark writing
> + -b 1000 : stop if any latency is greater than 1000 microseconds
> +
> +Note, the -b 1000 is used just to make --tracemark available.
> +
> +The we can see the histogram created by this with:
> +

'Then we can..

> 
> + # cat events/synthetic/latency/hist
> +# event histogram
> +#
> +# trigger info: hist:keys=lat,common_pid:vals=hitcount:sort=lat:size=2048 [active]
> +#
> +
> +{ lat:        107, common_pid:       2039 } hitcount:          1
> 

snip

> 
> +{ lat:        289, common_pid:       2039 } hitcount:          1
> +{ lat:        300, common_pid:       2039 } hitcount:          1
> +{ lat:        384, common_pid:       2039 } hitcount:          1
> +
> +Totals:
> +    Hits: 67625
> +    Entries: 278
> +    Dropped: 0
> +
> +Note, the writes are around the sleep, so ideally they will all be of 250
> +microseconds. If you are wondering how there are several that are under
> +250 microseconds, that is because the way cyclictest works, is if one
> +iteration comes in late, the next one will set the timer to wake up less that
> +250. That is, if an iteration came in 50 microseconds late, the next wake up
> +will be at 200 microseconds.
> +
> +But this could easily be done in userspace. To make this even more
> +interesting, we can mix the histogram between events that happened in the
> +kernel with trace_marker.
> +
> + # cd /sys/kernel/tracing
> + # echo 'latency u64 lat' > synthetic_events
> + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' > events/sched/sched_waking/trigger
> + # echo '!hist:keys=common_pid:vals=hitcount:lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_switch).latency($lat) if buf == "end"' > events/ftrace/print/trigger

This seems out of place here - disabling an onmatch(sched.sched_switch)
trigger that you haven't mentioned before.

> 
> 
> # echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).latency($lat) if buf == "end"' > events/ftrace/print/trigger
> + # echo 'hist:keys=lat,common_pid:sort=lat' > events/synthetic/latency/trigger
> +
> +The difference this time is that instead of using the trace_marker to start
> +the latency, the sched_waking event is used, matching the common_pid for the
> +trace_marker write with the pid that is being worken by sched_waking.
> 

s/worken/woken ?

> 
> +
> +After running cyclictest again with the same parameters, we now have:
> +
> + # cat events/synthetic/latency/hist
> +# event histogram
> +#
> +# trigger info: hist:keys=lat,common_pid:vals=hitcount:sort=lat:size=2048 [active]
> +#
> +
> +{ lat:          7, common_pid:       2302 } hitcount:        640
> 

snip

> 
> +{ lat:         61, common_pid:       2302 } hitcount:          1
> +{ lat:        110, common_pid:       2302 } hitcount:          1
> +
> +Totals:
> +    Hits: 89565
> +    Entries: 158
> +    Dropped: 0
> +
> +This doesn't tell us any information about how late cyclictest may have
> +worken up, but it does show us a nice histogram of how long it took from
> +the time that cyclictest was worken to the time it made it into user space.
> 

A couple more that don't seem to be 'worken' ;-)

Tom

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html