Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


I need to add a "Quick guide" and "Tips and tricks" section to the
document. To explain the arguments better...

Simple args are:

 "int val", "unsigned int val", "char x", "unsigned long addr",

Also the types:

 "s32 val", "u32 val", "s8 x", "u64 addr"

The above are all printed in decimal "%d" or "%u", if you want hex...

 "x32 val", "x8 x", "x64 addr"

If you want to have it use "%pS" to print (symbols)

 "symbol addr" is like: "%pS", (void *)addr

Arrays can be expressed after the type:

 "x8[6] mac" is like: "%x,%x,%x,%x,%x,%x", mac[0], mac[1], mac[2],
                                           mac[3], mac[4], mac[5]

Where mac would be: unsigned char mac[6] type.

Note, arrays of type "char" and "unsigned char" turn into a static

 "char[10] str" is like: "%s", str

Where str is defined as char str[10];

If the argument is a pointer to a structure, you can index into the

 "x64 ip[16]" is like: "%llx", ((u64 *)ip)[16]

Finally, if an argument is a pointer to a structure, and you want to
get to another structure that it points to, for example

 struct sk_buff *skb;

and you want to get to:


when the parameter is a pointer to skb.

  (gdb) printf "%d\n", &((struct sk_buff *)0)->dev
  (gdb) printf "%d\n", &((struct net_device *)0)->perm_addr

The net_device *dev is 16 bytes into sk_buff, and the char array
perm_addr, is 558 bytes into the net_device structure.

Where perm_addr is an array of the mac address.

 "x8[6] perm_addr+16[0]+558"

Which would turn into:

 char *dev = (char **)(((void *)skb)+16)[0];
 char *perm_addr = (char *)(dev+558);

 "%x,%x,%x,%x,%x,%x", perm_addr[0], perm_addr[1], perm_addr[2],
                      perm_addr[3], perm_addr[4], perm_addr[5]

OK, the above is a bit complex ;-) But works nicely.

-- Steve

On Fri, 02 Feb 2018 18:04:58 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> At Kernel Summit back in October, we tried to bring up trace markers, which
> would be nops within the kernel proper, that would allow modules to hook
> arbitrary trace events to them. The reaction to this proposal was less than
> favorable. We were told that we were trying to make a work around for a
> problem, and not solving it. The problem in our minds is the notion of a
> "stable trace event".
> There are maintainers that do not want trace events, or more trace events in
> their subsystems. This is due to the fact that trace events post an
> interface to user space, and this interface could become required by some
> tool. This may cause the trace event to become stable where it must not
> break the tool, and thus prevent the code from changing.
> Or, the trace event may just have to add padding for fields that tools
> may require. The "success" field of the sched_wakeup trace event is one such
> instance. There is no more "success" variable, but tools may fail if it were
> to go away, so a "1" is simply added to the trace event wasting ring buffer
> real estate.
> I talked with Linus about this, and he told me that we already have these
> markers in the kernel. They are from the mcount/__fentry__ used by function
> tracing. Have the trace events be created by these, and see if this will
> satisfy most areas that want trace events.
> I decided to implement this idea, and here's the patch set.
> Introducing "function based events". These are created dynamically by a
> tracefs file called "function_events". By writing a pseudo prototype into
> this file, you create an event.
>  # mount -t tracefs nodev /sys/kernel/tracing
>  # cd /sys/kernel/tracing
>  # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events
>  # cat events/functions/do_IRQ/format
> name: do_IRQ
> ID: 1399
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
> 	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
> 	field:symbol ip;	offset:24;	size:8;	signed:0;
> 	field:x64 irq_stack[6];	offset:32;	size:48;	signed:0;
> print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip,
> REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4],
> REC->irq_stack[5]
>  # echo 1 > events/functions/do_IRQ/enable
>  # cat trace
>           <idle>-0     [003] d..3  3647.049344: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.049433: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.049672: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325709: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325929: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325993: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387571: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387791: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387874: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
> And this is much more powerful than just this. We can show strings, and
> index off of structures into other structures.
>   # echo '__vfs_read(symbol read+40[0]+16)' > function_events
>   # echo 1 > events/functions/__vfs_read/enable
>   # cat trace
>          sshd-1343  [005] ...2   199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>          bash-1344  [003] ...2   199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>          sshd-1343  [005] ...2   199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>  avahi-daemon-910   [003] ...2   200.136740: vfs_read->__vfs_read(read=          (null))
>  avahi-daemon-910   [003] ...2   200.136750: vfs_read->__vfs_read(read=          (null))
> And even read user space:
>   # echo 'SyS_openat(int dfd, string path, x32 flags, x16 mode)' > function_events
>   # echo 1 > events/functions/enable
>   # grep task_fork /proc/kallsyms 
> ffffffff810d5a60 t task_fork_fair
> ffffffff810dfc30 t task_fork_dl
>   # cat trace
>             grep-1820  [000] ...2  3926.107603: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, path=/proc/kallsyms, flags=100, mode=0)
> These are fully functional events. That is, they work with ftrace,
> trace-cmd, perf, histograms, triggers, and eBPF.
> What's next? I need to rewrite the function graph tracer, and be able to add
> dynamic events on function return.
> I made this work with x86 with a simple function that only returns
> 6 function parameters for x86_64 and 3 for x86_32. But this could easily
> be extended.
> Cheers!
>   git://
> ftrace/dynamic-ftrace-events
> Head SHA1: 30fbdffd5d38bd27b04fb911f7158f10a99be8c4
> Steven Rostedt (VMware) (18):
>       tracing: Add function based events
>       tracing: Add documentation for function based events
>       tracing: Add simple arguments to function based events
>       tracing/x86: Add arch_get_func_args() function
>       tracing: Add hex print for dynamic ftrace based events
>       tracing: Add indirect offset to args of ftrace based events
>       tracing: Add dereferencing multiple fields per arg
>       tracing: Add "unsigned" to function based events
>       tracing: Add indexing of arguments for function based events
>       tracing: Make func_type enums for easier comparing of arg types
>       tracing: Add symbol type to function based events
>       tracing: Add accessing direct address from function based events
>       tracing: Add array type to function based events
>       tracing: Have char arrays be strings for function based events
>       tracing: Add string type for dynamic strings in function based events
>       tracing: Add NULL to skip args for function based events
>       tracing: Add indirect to indirect access for function based events
>       tracing/perf: Allow perf to use function based events
> ----
>  Documentation/trace/function-based-events.rst |  426 ++++++++
>  arch/x86/kernel/ftrace.c                      |   28 +
>  include/linux/trace_events.h                  |    2 +
>  kernel/trace/Kconfig                          |   12 +
>  kernel/trace/Makefile                         |    1 +
>  kernel/trace/trace.h                          |   11 +
>  kernel/trace/trace_event_ftrace.c             | 1440 +++++++++++++++++++++++++
>  kernel/trace/trace_probe.h                    |   11 -
>  8 files changed, 1920 insertions(+), 11 deletions(-)
>  create mode 100644 Documentation/trace/function-based-events.rst
>  create mode 100644 kernel/trace/trace_event_ftrace.c
> --
> To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at

To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux