Just some brief statement from a patch by Mathieu - explaining the meaning of tracepoints. (semantically, it should be similar to kernel markers, right?) On Wed, Jul 16, 2008 at 6:26 AM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> wrote: > Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers. > Allows complete typing verification by declaring both tracing statement inline > functions and probe registration/unregistration static inline functions within > the same macro "DEFINE_TRACE". No format string is required. See the > tracepoint Documentation and Samples patches for usage examples. > > Taken from the documentation patch : > > "A tracepoint placed in code provides a hook to call a function (probe) that you > can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or > "off" (no probe is attached). When a tracepoint is "off" it has no effect, > except for adding a tiny time penalty (checking a condition for a branch) and > space penalty (adding a few bytes for the function call at the end of the > instrumented function and adds a data structure in a separate section). When a > tracepoint is "on", the function you provide is called each time the tracepoint > is executed, in the execution context of the caller. When the function provided > ends its execution, it returns to the caller (continuing from the tracepoint > site). > > You can put tracepoints at important locations in the code. They are lightweight > hooks that can pass an arbitrary number of parameters, which prototypes are > described in a tracepoint declaration placed in a header file." > > Addition and removal of tracepoints is synchronized by RCU using the > scheduler (and preempt_disable) as guarantees to find a quiescent state > (this is really RCU "classic"). The update side uses rcu_barrier_sched() > with call_rcu_sched() and the read/execute side uses > "preempt_disable()/preempt_enable()". > > We make sure the previous array containing probes, which has been scheduled for > deletion by the rcu callback, is indeed freed before we proceed to the next > update. It therefore limits the rate of modification of a single tracepoint to > one update per RCU period. The objective here is to permit fast batch > add/removal of probes on _different_ tracepoints. > > Changelog : > - Use #name ":" #proto as string to identify the tracepoint in the > tracepoint table. This will make sure not type mismatch happens due to > connexion of a probe with the wrong type to a tracepoint declared with > the same name in a different header. > - Add tracepoint_entry_free_old. > - Change __TO_TRACE to get rid of the 'i' iterator. > > Masami Hiramatsu <mhiramat@xxxxxxxxxx> : > Tested on x86-64. > > Performance impact of a tracepoint : same as markers, except that it adds about > 70 bytes of instructions in an unlikely branch of each instrumented function > (the for loop, the stack setup and the function call). It currently adds a > memory read, a test and a conditional branch at the instrumentation site (in the > hot path). Immediate values will eventually change this into a load immediate, > test and branch, which removes the memory read which will make the i-cache > impact smaller (changing the memory read for a load immediate removes 3-4 bytes > per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also > saves the d-cache hit). > > About the performance impact of tracepoints (which is comparable to markers), > even without immediate values optimizations, tests done by Hideo Aoki on ia64 > show no regression. His test case was using hackbench on a kernel where > scheduler instrumentation (about 5 events in code scheduler code) was added. > > > Quoting Hideo Aoki about Markers : > > I evaluated overhead of kernel marker using linux-2.6-sched-fixes > git tree, which includes several markers for LTTng, using an ia64 > server. > > While the immediate trace mark feature isn't implemented on ia64, > there is no major performance regression. So, I think that we > don't have any issues to propose merging marker point patches > into Linus's tree from the viewpoint of performance impact. > > I prepared two kernels to evaluate. The first one was compiled > without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS. > > I downloaded the original hackbench from the following URL: > http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c > > I ran hackbench 5 times in each condition and calculated the > average and difference between the kernels. > > The parameter of hackbench: every 50 from 50 to 800 > The number of CPUs of the server: 2, 4, and 8 > > Below is the results. As you can see, major performance > regression wasn't found in any case. Even if number of processes > increases, differences between marker-enabled kernel and marker- > disabled kernel doesn't increase. Moreover, if number of CPUs > increases, the differences doesn't increase either. > > Curiously, marker-enabled kernel is better than marker-disabled > kernel in more than half cases, although I guess it comes from > the difference of memory access pattern. > > > * 2 CPUs > > Number of | without | with | diff | diff | > processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | > -------------------------------------------------------------- > 50 | 4.811 | 4.872 | +0.061 | +1.27 | > 100 | 9.854 | 10.309 | +0.454 | +4.61 | > 150 | 15.602 | 15.040 | -0.562 | -3.6 | > 200 | 20.489 | 20.380 | -0.109 | -0.53 | > 250 | 25.798 | 25.652 | -0.146 | -0.56 | > 300 | 31.260 | 30.797 | -0.463 | -1.48 | > 350 | 36.121 | 35.770 | -0.351 | -0.97 | > 400 | 42.288 | 42.102 | -0.186 | -0.44 | > 450 | 47.778 | 47.253 | -0.526 | -1.1 | > 500 | 51.953 | 52.278 | +0.325 | +0.63 | > 550 | 58.401 | 57.700 | -0.701 | -1.2 | > 600 | 63.334 | 63.222 | -0.112 | -0.18 | > 650 | 68.816 | 68.511 | -0.306 | -0.44 | > 700 | 74.667 | 74.088 | -0.579 | -0.78 | > 750 | 78.612 | 79.582 | +0.970 | +1.23 | > 800 | 85.431 | 85.263 | -0.168 | -0.2 | > -------------------------------------------------------------- > > * 4 CPUs > > Number of | without | with | diff | diff | > processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | > -------------------------------------------------------------- > 50 | 2.586 | 2.584 | -0.003 | -0.1 | > 100 | 5.254 | 5.283 | +0.030 | +0.56 | > 150 | 8.012 | 8.074 | +0.061 | +0.76 | > 200 | 11.172 | 11.000 | -0.172 | -1.54 | > 250 | 13.917 | 14.036 | +0.119 | +0.86 | > 300 | 16.905 | 16.543 | -0.362 | -2.14 | > 350 | 19.901 | 20.036 | +0.135 | +0.68 | > 400 | 22.908 | 23.094 | +0.186 | +0.81 | > 450 | 26.273 | 26.101 | -0.172 | -0.66 | > 500 | 29.554 | 29.092 | -0.461 | -1.56 | > 550 | 32.377 | 32.274 | -0.103 | -0.32 | > 600 | 35.855 | 35.322 | -0.533 | -1.49 | > 650 | 39.192 | 38.388 | -0.804 | -2.05 | > 700 | 41.744 | 41.719 | -0.025 | -0.06 | > 750 | 45.016 | 44.496 | -0.520 | -1.16 | > 800 | 48.212 | 47.603 | -0.609 | -1.26 | > -------------------------------------------------------------- > > * 8 CPUs > > Number of | without | with | diff | diff | > processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | > -------------------------------------------------------------- > 50 | 2.094 | 2.072 | -0.022 | -1.07 | > 100 | 4.162 | 4.273 | +0.111 | +2.66 | > 150 | 6.485 | 6.540 | +0.055 | +0.84 | > 200 | 8.556 | 8.478 | -0.078 | -0.91 | > 250 | 10.458 | 10.258 | -0.200 | -1.91 | > 300 | 12.425 | 12.750 | +0.325 | +2.62 | > 350 | 14.807 | 14.839 | +0.032 | +0.22 | > 400 | 16.801 | 16.959 | +0.158 | +0.94 | > 450 | 19.478 | 19.009 | -0.470 | -2.41 | > 500 | 21.296 | 21.504 | +0.208 | +0.98 | > 550 | 23.842 | 23.979 | +0.137 | +0.57 | > 600 | 26.309 | 26.111 | -0.198 | -0.75 | > 650 | 28.705 | 28.446 | -0.259 | -0.9 | > 700 | 31.233 | 31.394 | +0.161 | +0.52 | > 750 | 34.064 | 33.720 | -0.344 | -1.01 | > 800 | 36.320 | 36.114 | -0.206 | -0.57 | > -------------------------------------------------------------- > > Best regards, > Hideo > > > P.S. When I compiled the linux-2.6-sched-fixes tree on ia64, I > had to revert the following git commit since pteval_t is defined > on x86 only. > > commit 8686f2b37e7394b51dd6593678cbfd85ecd28c65 > Date: Tue May 6 15:42:40 2008 -0700 > > generic, x86, PAT: fix mprotect > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> > Acked-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > CC: 'Peter Zijlstra' <peterz@xxxxxxxxxxxxx> > CC: "Frank Ch. Eigler" <fche@xxxxxxxxxx> > CC: 'Ingo Molnar' <mingo@xxxxxxx> > CC: 'Hideo AOKI' <haoki@xxxxxxxxxx> > CC: Takashi Nishiie <t-nishiie@xxxxxxxxxxxxxxxxxx> > CC: 'Steven Rostedt' <rostedt@xxxxxxxxxxx> > CC: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> > CC: Eduard - Gabriel Munteanu <eduard.munteanu@xxxxxxxxxxx> > --- > include/asm-generic/vmlinux.lds.h | 6 > include/linux/module.h | 17 + > include/linux/tracepoint.h | 127 ++++++++++ > init/Kconfig | 7 > kernel/Makefile | 1 > kernel/module.c | 66 +++++ > kernel/tracepoint.c | 473 ++++++++++++++++++++++++++++++++++++++ > 7 files changed, 695 insertions(+), 2 deletions(-) > > Index: linux-2.6-lttng/init/Kconfig > =================================================================== > --- linux-2.6-lttng.orig/init/Kconfig 2008-07-15 17:34:42.000000000 -0400 > +++ linux-2.6-lttng/init/Kconfig 2008-07-15 17:35:00.000000000 -0400 > @@ -782,6 +782,13 @@ config PROFILING > Say Y here to enable the extended profiling support mechanisms used > by profilers such as OProfile. > > +config TRACEPOINTS > + bool "Activate tracepoints" > + default y > + help > + Place an empty function call at each tracepoint site. Can be > + dynamically changed for a probe function. > + > config MARKERS > bool "Activate markers" > help > Index: linux-2.6-lttng/kernel/Makefile > =================================================================== > --- linux-2.6-lttng.orig/kernel/Makefile 2008-07-15 17:34:42.000000000 -0400 > +++ linux-2.6-lttng/kernel/Makefile 2008-07-15 17:35:00.000000000 -0400 > @@ -77,6 +77,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o > obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o > obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o > obj-$(CONFIG_MARKERS) += marker.o > +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o > obj-$(CONFIG_LATENCYTOP) += latencytop.o > obj-$(CONFIG_FTRACE) += trace/ > obj-$(CONFIG_TRACING) += trace/ > Index: linux-2.6-lttng/include/linux/tracepoint.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-2.6-lttng/include/linux/tracepoint.h 2008-07-15 17:35:19.000000000 -0400 > @@ -0,0 +1,127 @@ > +#ifndef _LINUX_TRACEPOINT_H > +#define _LINUX_TRACEPOINT_H > + > +/* > + * Kernel Tracepoint API. > + * > + * See Documentation/tracepoint.txt. > + * > + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> > + * > + * Heavily inspired from the Linux Kernel Markers. > + * > + * This file is released under the GPLv2. > + * See the file COPYING for more details. > + */ > + > +#include <linux/types.h> > +#include <linux/rcupdate.h> > + > +struct module; > +struct tracepoint; > + > +struct tracepoint { > + const char *name; /* Tracepoint name */ > + int state; /* State. */ > + void **funcs; > +} __attribute__((aligned(8))); > + > + > +#define TPPROTO(args...) args > +#define TPARGS(args...) args > + > +#ifdef CONFIG_TRACEPOINTS > + > +/* > + * it_func[0] is never NULL because there is at least one element in the array > + * when the array itself is non NULL. > + */ > +#define __DO_TRACE(tp, proto, args) \ > + do { \ > + void **it_func; \ > + \ > + rcu_read_lock_sched(); \ > + it_func = rcu_dereference((tp)->funcs); \ > + if (it_func) { \ > + do { \ > + ((void(*)(proto))(*it_func))(args); \ > + } while (*(++it_func)); \ > + } \ > + rcu_read_unlock_sched(); \ > + } while (0) > + > +/* > + * Make sure the alignment of the structure in the __tracepoints section will > + * not add unwanted padding between the beginning of the section and the > + * structure. Force alignment to the same alignment as the section start. > + */ > +#define DEFINE_TRACE(name, proto, args) \ > + static inline void trace_##name(proto) \ > + { \ > + static const char __tpstrtab_##name[] \ > + __attribute__((section("__tracepoints_strings"))) \ > + = #name ":" #proto; \ > + static struct tracepoint __tracepoint_##name \ > + __attribute__((section("__tracepoints"), aligned(8))) = \ > + { __tpstrtab_##name, 0, NULL }; \ > + if (unlikely(__tracepoint_##name.state)) \ > + __DO_TRACE(&__tracepoint_##name, \ > + TPPROTO(proto), TPARGS(args)); \ > + } \ > + static inline int register_trace_##name(void (*probe)(proto)) \ > + { \ > + return tracepoint_probe_register(#name ":" #proto, \ > + (void *)probe); \ > + } \ > + static inline void unregister_trace_##name(void (*probe)(proto))\ > + { \ > + tracepoint_probe_unregister(#name ":" #proto, \ > + (void *)probe); \ > + } > + > +extern void tracepoint_update_probe_range(struct tracepoint *begin, > + struct tracepoint *end); > + > +#else /* !CONFIG_TRACEPOINTS */ > +#define DEFINE_TRACE(name, proto, args) \ > + static inline void _do_trace_##name(struct tracepoint *tp, proto) \ > + { } \ > + static inline void trace_##name(proto) \ > + { } \ > + static inline int register_trace_##name(void (*probe)(proto)) \ > + { \ > + return -ENOSYS; \ > + } \ > + static inline void unregister_trace_##name(void (*probe)(proto))\ > + { } > + > +static inline void tracepoint_update_probe_range(struct tracepoint *begin, > + struct tracepoint *end) > +{ } > +#endif /* CONFIG_TRACEPOINTS */ > + > +/* > + * Connect a probe to a tracepoint. > + * Internal API, should not be used directly. > + */ > +extern int tracepoint_probe_register(const char *name, void *probe); > + > +/* > + * Disconnect a probe from a tracepoint. > + * Internal API, should not be used directly. > + */ > +extern int tracepoint_probe_unregister(const char *name, void *probe); > + > +struct tracepoint_iter { > + struct module *module; > + struct tracepoint *tracepoint; > +}; > + > +extern void tracepoint_iter_start(struct tracepoint_iter *iter); > +extern void tracepoint_iter_next(struct tracepoint_iter *iter); > +extern void tracepoint_iter_stop(struct tracepoint_iter *iter); > +extern void tracepoint_iter_reset(struct tracepoint_iter *iter); > +extern int tracepoint_get_iter_range(struct tracepoint **tracepoint, > + struct tracepoint *begin, struct tracepoint *end); > + -- Regards, Peter Teoh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ