Re: [patch 02/17] Kernel Tracepoints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just some brief statement from a patch by Mathieu - explaining the
meaning of tracepoints.   (semantically, it should be similar to
kernel markers, right?)

On Wed, Jul 16, 2008 at 6:26 AM, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxx> wrote:
> Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers.
> Allows complete typing verification by declaring both tracing statement inline
> functions and probe registration/unregistration static inline functions within
> the same macro "DEFINE_TRACE". No format string is required. See the
> tracepoint Documentation and Samples patches for usage examples.
>
> Taken from the documentation patch :
>
> "A tracepoint placed in code provides a hook to call a function (probe) that you
> can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
> "off" (no probe is attached). When a tracepoint is "off" it has no effect,
> except for adding a tiny time penalty (checking a condition for a branch) and
> space penalty (adding a few bytes for the function call at the end of the
> instrumented function and adds a data structure in a separate section).  When a
> tracepoint is "on", the function you provide is called each time the tracepoint
> is executed, in the execution context of the caller. When the function provided
> ends its execution, it returns to the caller (continuing from the tracepoint
> site).
>
> You can put tracepoints at important locations in the code. They are lightweight
> hooks that can pass an arbitrary number of parameters, which prototypes are
> described in a tracepoint declaration placed in a header file."
>
> Addition and removal of tracepoints is synchronized by RCU using the
> scheduler (and preempt_disable) as guarantees to find a quiescent state
> (this is really RCU "classic"). The update side uses rcu_barrier_sched()
> with call_rcu_sched() and the read/execute side uses
> "preempt_disable()/preempt_enable()".
>
> We make sure the previous array containing probes, which has been scheduled for
> deletion by the rcu callback, is indeed freed before we proceed to the next
> update. It therefore limits the rate of modification of a single tracepoint to
> one update per RCU period. The objective here is to permit fast batch
> add/removal of probes on _different_ tracepoints.
>
> Changelog :
> - Use #name ":" #proto as string to identify the tracepoint in the
>  tracepoint table. This will make sure not type mismatch happens due to
>  connexion of a probe with the wrong type to a tracepoint declared with
>  the same name in a different header.
> - Add tracepoint_entry_free_old.
> - Change __TO_TRACE to get rid of the 'i' iterator.
>
> Masami Hiramatsu <mhiramat@xxxxxxxxxx> :
> Tested on x86-64.
>
> Performance impact of a tracepoint : same as markers, except that it adds about
> 70 bytes of instructions in an unlikely branch of each instrumented function
> (the for loop, the stack setup and the function call). It currently adds a
> memory read, a test and a conditional branch at the instrumentation site (in the
> hot path). Immediate values will eventually change this into a load immediate,
> test and branch, which removes the memory read which will make the i-cache
> impact smaller (changing the memory read for a load immediate removes 3-4 bytes
> per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also
> saves the d-cache hit).
>
> About the performance impact of tracepoints (which is comparable to markers),
> even without immediate values optimizations, tests done by Hideo Aoki on ia64
> show no regression. His test case was using hackbench on a kernel where
> scheduler instrumentation (about 5 events in code scheduler code) was added.
>
>
> Quoting Hideo Aoki about Markers :
>
> I evaluated overhead of kernel marker using linux-2.6-sched-fixes
> git tree, which includes several markers for LTTng, using an ia64
> server.
>
> While the immediate trace mark feature isn't implemented on ia64,
> there is no major performance regression. So, I think that we
> don't have any issues to propose merging marker point patches
> into Linus's tree from the viewpoint of performance impact.
>
> I prepared two kernels to evaluate. The first one was compiled
> without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
>
> I downloaded the original hackbench from the following URL:
> http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
>
> I ran hackbench 5 times in each condition and calculated the
> average and difference between the kernels.
>
>    The parameter of hackbench: every 50 from 50 to 800
>    The number of CPUs of the server: 2, 4, and 8
>
> Below is the results. As you can see, major performance
> regression wasn't found in any case. Even if number of processes
> increases, differences between marker-enabled kernel and marker-
> disabled kernel doesn't increase. Moreover, if number of CPUs
> increases, the differences doesn't increase either.
>
> Curiously, marker-enabled kernel is better than marker-disabled
> kernel in more than half cases, although I guess it comes from
> the difference of memory access pattern.
>
>
> * 2 CPUs
>
> Number of | without      | with         | diff     | diff    |
> processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
> --------------------------------------------------------------
>       50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
>      100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
>      150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
>      200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
>      250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
>      300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
>      350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
>      400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
>      450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
>      500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
>      550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
>      600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
>      650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
>      700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
>      750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
>      800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
> --------------------------------------------------------------
>
> * 4 CPUs
>
> Number of | without      | with         | diff     | diff    |
> processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
> --------------------------------------------------------------
>       50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
>      100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
>      150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
>      200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
>      250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
>      300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
>      350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
>      400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
>      450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
>      500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
>      550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
>      600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
>      650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
>      700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
>      750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
>      800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
> --------------------------------------------------------------
>
> * 8 CPUs
>
> Number of | without      | with         | diff     | diff    |
> processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
> --------------------------------------------------------------
>       50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
>      100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
>      150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
>      200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
>      250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
>      300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
>      350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
>      400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
>      450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
>      500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
>      550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
>      600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
>      650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
>      700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
>      750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
>      800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
> --------------------------------------------------------------
>
> Best regards,
> Hideo
>
>
> P.S. When I compiled the linux-2.6-sched-fixes tree on ia64, I
> had to revert the following git commit since pteval_t is defined
> on x86 only.
>
> commit 8686f2b37e7394b51dd6593678cbfd85ecd28c65
> Date:   Tue May 6 15:42:40 2008 -0700
>
>    generic, x86, PAT: fix mprotect
>
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx>
> Acked-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> CC: 'Peter Zijlstra' <peterz@xxxxxxxxxxxxx>
> CC: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
> CC: 'Ingo Molnar' <mingo@xxxxxxx>
> CC: 'Hideo AOKI' <haoki@xxxxxxxxxx>
> CC: Takashi Nishiie <t-nishiie@xxxxxxxxxxxxxxxxxx>
> CC: 'Steven Rostedt' <rostedt@xxxxxxxxxxx>
> CC: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> CC: Eduard - Gabriel Munteanu <eduard.munteanu@xxxxxxxxxxx>
> ---
>  include/asm-generic/vmlinux.lds.h |    6
>  include/linux/module.h            |   17 +
>  include/linux/tracepoint.h        |  127 ++++++++++
>  init/Kconfig                      |    7
>  kernel/Makefile                   |    1
>  kernel/module.c                   |   66 +++++
>  kernel/tracepoint.c               |  473 ++++++++++++++++++++++++++++++++++++++
>  7 files changed, 695 insertions(+), 2 deletions(-)
>
> Index: linux-2.6-lttng/init/Kconfig
> ===================================================================
> --- linux-2.6-lttng.orig/init/Kconfig   2008-07-15 17:34:42.000000000 -0400
> +++ linux-2.6-lttng/init/Kconfig        2008-07-15 17:35:00.000000000 -0400
> @@ -782,6 +782,13 @@ config PROFILING
>          Say Y here to enable the extended profiling support mechanisms used
>          by profilers such as OProfile.
>
> +config TRACEPOINTS
> +       bool "Activate tracepoints"
> +       default y
> +       help
> +         Place an empty function call at each tracepoint site. Can be
> +         dynamically changed for a probe function.
> +
>  config MARKERS
>        bool "Activate markers"
>        help
> Index: linux-2.6-lttng/kernel/Makefile
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/Makefile        2008-07-15 17:34:42.000000000 -0400
> +++ linux-2.6-lttng/kernel/Makefile     2008-07-15 17:35:00.000000000 -0400
> @@ -77,6 +77,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
>  obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
>  obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
>  obj-$(CONFIG_MARKERS) += marker.o
> +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
>  obj-$(CONFIG_LATENCYTOP) += latencytop.o
>  obj-$(CONFIG_FTRACE) += trace/
>  obj-$(CONFIG_TRACING) += trace/
> Index: linux-2.6-lttng/include/linux/tracepoint.h
> ===================================================================
> --- /dev/null   1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/linux/tracepoint.h  2008-07-15 17:35:19.000000000 -0400
> @@ -0,0 +1,127 @@
> +#ifndef _LINUX_TRACEPOINT_H
> +#define _LINUX_TRACEPOINT_H
> +
> +/*
> + * Kernel Tracepoint API.
> + *
> + * See Documentation/tracepoint.txt.
> + *
> + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx>
> + *
> + * Heavily inspired from the Linux Kernel Markers.
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/rcupdate.h>
> +
> +struct module;
> +struct tracepoint;
> +
> +struct tracepoint {
> +       const char *name;               /* Tracepoint name */
> +       int state;                      /* State. */
> +       void **funcs;
> +} __attribute__((aligned(8)));
> +
> +
> +#define TPPROTO(args...)       args
> +#define TPARGS(args...)                args
> +
> +#ifdef CONFIG_TRACEPOINTS
> +
> +/*
> + * it_func[0] is never NULL because there is at least one element in the array
> + * when the array itself is non NULL.
> + */
> +#define __DO_TRACE(tp, proto, args)                                    \
> +       do {                                                            \
> +               void **it_func;                                         \
> +                                                                       \
> +               rcu_read_lock_sched();                                  \
> +               it_func = rcu_dereference((tp)->funcs);                 \
> +               if (it_func) {                                          \
> +                       do {                                            \
> +                               ((void(*)(proto))(*it_func))(args);     \
> +                       } while (*(++it_func));                         \
> +               }                                                       \
> +               rcu_read_unlock_sched();                                \
> +       } while (0)
> +
> +/*
> + * Make sure the alignment of the structure in the __tracepoints section will
> + * not add unwanted padding between the beginning of the section and the
> + * structure. Force alignment to the same alignment as the section start.
> + */
> +#define DEFINE_TRACE(name, proto, args)                                        \
> +       static inline void trace_##name(proto)                          \
> +       {                                                               \
> +               static const char __tpstrtab_##name[]                   \
> +               __attribute__((section("__tracepoints_strings")))       \
> +               = #name ":" #proto;                                     \
> +               static struct tracepoint __tracepoint_##name            \
> +               __attribute__((section("__tracepoints"), aligned(8))) = \
> +               { __tpstrtab_##name, 0, NULL };                         \
> +               if (unlikely(__tracepoint_##name.state))                \
> +                       __DO_TRACE(&__tracepoint_##name,                \
> +                               TPPROTO(proto), TPARGS(args));          \
> +       }                                                               \
> +       static inline int register_trace_##name(void (*probe)(proto))   \
> +       {                                                               \
> +               return tracepoint_probe_register(#name ":" #proto,      \
> +                       (void *)probe);                                 \
> +       }                                                               \
> +       static inline void unregister_trace_##name(void (*probe)(proto))\
> +       {                                                               \
> +               tracepoint_probe_unregister(#name ":" #proto,           \
> +                       (void *)probe);                                 \
> +       }
> +
> +extern void tracepoint_update_probe_range(struct tracepoint *begin,
> +       struct tracepoint *end);
> +
> +#else /* !CONFIG_TRACEPOINTS */
> +#define DEFINE_TRACE(name, proto, args)                        \
> +       static inline void _do_trace_##name(struct tracepoint *tp, proto) \
> +       { }                                                             \
> +       static inline void trace_##name(proto)                          \
> +       { }                                                             \
> +       static inline int register_trace_##name(void (*probe)(proto))   \
> +       {                                                               \
> +               return -ENOSYS;                                         \
> +       }                                                               \
> +       static inline void unregister_trace_##name(void (*probe)(proto))\
> +       { }
> +
> +static inline void tracepoint_update_probe_range(struct tracepoint *begin,
> +       struct tracepoint *end)
> +{ }
> +#endif /* CONFIG_TRACEPOINTS */
> +
> +/*
> + * Connect a probe to a tracepoint.
> + * Internal API, should not be used directly.
> + */
> +extern int tracepoint_probe_register(const char *name, void *probe);
> +
> +/*
> + * Disconnect a probe from a tracepoint.
> + * Internal API, should not be used directly.
> + */
> +extern int tracepoint_probe_unregister(const char *name, void *probe);
> +
> +struct tracepoint_iter {
> +       struct module *module;
> +       struct tracepoint *tracepoint;
> +};
> +
> +extern void tracepoint_iter_start(struct tracepoint_iter *iter);
> +extern void tracepoint_iter_next(struct tracepoint_iter *iter);
> +extern void tracepoint_iter_stop(struct tracepoint_iter *iter);
> +extern void tracepoint_iter_reset(struct tracepoint_iter *iter);
> +extern int tracepoint_get_iter_range(struct tracepoint **tracepoint,
> +       struct tracepoint *begin, struct tracepoint *end);
> +



-- 
Regards,
Peter Teoh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux