Hello, This is the 3rd version of RFC series for the boot-time tracing. Previous thread is here. https://lkml.kernel.org/r/156316746861.23477.5815110570539190650.stgit@devnote2 On that thread and offline talk with Frank Rowand at OSS Japan2019 last month, we agreed that the devicetree is only for hardware description and should not expand it for software settings. Thus, in this version, I introduced a new extended command line feature called Supplemental Kernel Cmdline (SKC) instead of devicetree, and rewrote the boot-time tracing part based on that. Supplemental Kernel Cmdline =========================== Supplemental kernel command line (SKC) allows admin to pass a tree-structured supplemental kernel commandline file (SKC file) when boot up kernel. This expands the kernel command line in efficient way. Each key is described as a dot-jointed-words. And user can write the key-words in tree stlye. For example, feature.option.foo = 1; feature.option.bar = 2; can be also written in feature.option { foo = 1; bar = 2; } (Note that in both style, the same words are merged automatically and make a single tree) All values are treated as a string, or array of strings, e.g. feature.options = "foo", "bar"; User can see the loaded SKC key-value list via /proc/skc. The size of SKC is limited upto 32KB and 512 key-words and values in total. SKC and Bootloader ================== To play with SKC, you need to use patched qemu or patched grub. Please checkout below for x86 support. (for PoC, I implemented it only for x86, but it is not so hard to do same on other archs) https://github.com/mhiramat/qemu.git skc https://github.com/mhiramat/grub.git grub-x86-skc User can pass an skc file when boot the qemu machine with "-skc PATH" option. Or specify the skc file on grub console by "skc PATH". Boot-time Tracing ================= Boot-time tracing side has been updated for SKC, but the difference is small, because SKC has similar tree-based interfaces to OF APIs. Currently, it supports following SKC options. Please read Documentation/trace/boottime-trace.rst for details. - ftrace.options = OPT1[,OPT2...]; - ftrace.trace_clock = CLOCK; - ftrace.dump_on_oops [= MODE]; - ftrace.traceoff_on_warning; - ftrace.tp_printk; - ftrace.buffer_size = SIZE; - ftrace.alloc_snapshot; - ftrace.events = EVENT[, EVENT2...]; - ftrace.tracer = TRACER; - ftrace.event.GROUP.EVENT.filter = FILTER; - ftrace.event.GROUP.EVENT.actions = ACTION[, ACTION2...]; - ftrace.event.GROUP.EVENT.enable; - ftrace.event.kprobes.EVENT.probes = PROBE[, PROBE2...]; - ftrace.event.synthetic.EVENT.fields = FIELD[, FIELD2...]; - ftrace.[instance.INSTANCE.]cpumask = CPUMASK; - ftrace.[instance.INSTANCE.]ftrace.filters - ftrace.[instance.INSTANCE.]ftrace.notraces - ftrace.fgraph.filters = FILTER[, FILTER2...]; - ftrace.fgraph.notraces = FILTER[, FILTER2...]; - ftrace.fgraph.max_depth = MAX_DEPTH; This series can be applied on Steve's tracing tree (ftrace/core) or available on below https://github.com/mhiramat/linux.git ftrace-boottrace-v3 Usage ===== With this series, we can setup new kprobe and synthetic events, more complicated event filters and trigger actions including histogram via supplemental kernel cmdline. We can add filter and actions for each event, define kprobe events, and synthetic events with histogram like below. ftrace.event { task.task_newtask { filter = "pid < 128"; enable; } kprobes.vfs_read { probes = "vfs_read $arg1 $arg2"; filter = "common_pid < 200"; enable; } synthetic.initcall_latency { fields = "unsigned long func", "u64 lat"; actions = "hist:keys=func.sym,lat:vals=lat:sort=lat"; } initcall.initcall_start { actions = "hist:keys=func:ts0=common_timestamp.usecs"; } initcall.initcall_finish { actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:onmatch(initcall.initcall_start).initcall_latency(func,$lat)"; } } Also, this supports "instance" node, which allows us to run several tracers for different purpose at once. For example, one tracer is for tracing functions in module alpha, and others tracing module beta, you can write followings. ftrace.instance { foo { tracer = "function"; ftrace-filters = "*:mod:alpha"; } bar { tracer = "function"; ftrace-filters = "*:mod:beta"; } } The instance node also accepts event nodes so that each instance can customize its event tracing. This boot-time trace also supports ftrace kernel parameters. For example, following kernel parameters trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M ftrace=function ftrace_filter="vfs*" it can be written in SKC like below. ftrace { options = sym-addr; events = "initcall:*"; tp-printk; buffer-size = 1MB; ftrace-filters = "vfs*"; } However, since the initialization timing is different, if you need to trace very early boot, please use normal kernel parameters. Some Notes ========== - For specifying the skc data address, there are 2 ways to pass it to kernel. One is expanding arch-specific entries (e.g. setup_data on x86), another is passing it via kernel cmdline. The former may need to expand devicetree anyway (we need /chosen/skc_addr property for some arch.) The latter is more generic, but it may not work if the space on cmdline shorts. According to Frank's suggestion, I introduced "skc=PADDR,SIZE" option to kernel cmdline, boot loaders must support it. This is generic, and I still have some concerns, like memory reservation. In this version, I used early_param() to reserve SKC memory. If it is too late, we'd better considering to use arch specific data passing. - Currently, supplemental kernel cmdline doesn't support __setup() routine to avoid confusion, but I think it is possible to support it. - For saving memory consuming after boot, all variables and APIs are __init and __initdata. Thus runtime memory footprint becomes minimum. But if some loadable modules wants to use it. We have to keep at least query interface and tree nodes on memory. - As you can see, the EVENT.actions value is a bit ugly. Maybe we can introduce sub-nodes under action node, something like below. ftrace.event.initcall.initcall_finish.action.hist { keys = "func"; lat = "common_timestamp.usecs-$ts0"; onmatch = "initcall.initcall_start"; call = "initcall_latency(func,$lat)"; } Any suggestions, thoughts? Thank you, --- Masami Hiramatsu (19): skc: Add supplemental kernel cmdline support skc: Add /proc/sup_cmdline to show SKC key-value list skc: Add a boot setup routine from cmdline Documentation: skc: Add a doc for supplemental kernel cmdline tracing: Apply soft-disabled and filter to tracepoints printk tracing: kprobes: Output kprobe event to printk buffer tracing: Expose EXPORT_SYMBOL_GPL symbol tracing: kprobes: Register to dynevent earlier stage tracing: Accept different type for synthetic event fields tracing: Add NULL trace-array check in print_synth_event() tracing/boot: Add boot-time tracing by supplemental kernel cmdline tracing/boot: Add per-event settings tracing/boot Add kprobe event support tracing/boot: Add synthetic event support tracing/boot: Add instance node support tracing/boot: Add cpu_mask option support tracing/boot: Add function tracer filter options tracing/boot: Add function-graph tracer options Documentation: tracing: Add boot-time tracing document Documentation/admin-guide/index.rst | 1 Documentation/admin-guide/kernel-parameters.txt | 6 Documentation/admin-guide/skc.rst | 123 ++++ Documentation/trace/boottime-trace.rst | 185 ++++++ MAINTAINERS | 8 arch/Kconfig | 9 fs/proc/Makefile | 1 fs/proc/sup_cmdline.c | 106 ++++ include/linux/skc.h | 205 +++++++ include/linux/trace_events.h | 1 init/main.c | 54 ++ kernel/trace/Kconfig | 9 kernel/trace/Makefile | 1 kernel/trace/ftrace.c | 85 ++- kernel/trace/trace.c | 90 ++- kernel/trace/trace_boot.c | 457 +++++++++++++++ kernel/trace/trace_events.c | 3 kernel/trace/trace_events_hist.c | 14 kernel/trace/trace_events_trigger.c | 2 kernel/trace/trace_kprobe.c | 81 ++- lib/Kconfig | 3 lib/Makefile | 2 lib/skc.c | 694 +++++++++++++++++++++++ 23 files changed, 2044 insertions(+), 96 deletions(-) create mode 100644 Documentation/admin-guide/skc.rst create mode 100644 Documentation/trace/boottime-trace.rst create mode 100644 fs/proc/sup_cmdline.c create mode 100644 include/linux/skc.h create mode 100644 kernel/trace/trace_boot.c create mode 100644 lib/skc.c -- Masami Hiramatsu (Linaro) <mhiramat@xxxxxxxxxx>