On Mon, Apr 08, 2024 at 11:01:54AM +0200, Marco Elver wrote: > Add "new_exec" tracepoint, which is run right after the point of no > return but before the current task assumes its new exec identity. > > Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint > runs before flushing the old exec, i.e. while the task still has the > original state (such as original MM), but when the new exec either > succeeds or crashes (but never returns to the original exec). > > Being able to trace this event can be helpful in a number of use cases: > > * allowing tracing eBPF programs access to the original MM on exec, > before current->mm is replaced; > * counting exec in the original task (via perf event); > * profiling flush time ("new_exec" to "sched_process_exec"). > > Example of tracing output ("new_exec" and "sched_process_exec"): > > $ cat /sys/kernel/debug/tracing/trace_pipe > <...>-379 [003] ..... 179.626921: new_exec: filename=/usr/bin/sshd pid=379 comm=sshd > <...>-379 [003] ..... 179.629131: sched_process_exec: filename=/usr/bin/sshd pid=379 old_pid=379 > <...>-381 [002] ..... 180.048580: new_exec: filename=/bin/bash pid=381 comm=sshd > <...>-381 [002] ..... 180.053122: sched_process_exec: filename=/bin/bash pid=381 old_pid=381 > <...>-385 [001] ..... 180.068277: new_exec: filename=/usr/bin/tty pid=385 comm=bash > <...>-385 [001] ..... 180.069485: sched_process_exec: filename=/usr/bin/tty pid=385 old_pid=385 > <...>-389 [006] ..... 192.020147: new_exec: filename=/usr/bin/dmesg pid=389 comm=bash > bash-389 [006] ..... 192.021377: sched_process_exec: filename=/usr/bin/dmesg pid=389 old_pid=389 > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx> > --- > fs/exec.c | 2 ++ > include/trace/events/task.h | 30 ++++++++++++++++++++++++++++++ > 2 files changed, 32 insertions(+) > > diff --git a/fs/exec.c b/fs/exec.c > index 38bf71cbdf5e..ab778ae1fc06 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1268,6 +1268,8 @@ int begin_new_exec(struct linux_binprm * bprm) > if (retval) > return retval; > > + trace_new_exec(current, bprm); > + All other steps in this function have explicit comments about what/why/etc. Please add some kind of comment describing why the tracepoint is where it is, etc. For example, maybe something like: /* * Before any changes to 'current', report that the exec is about to * happen (since we made it to the point of no return). On a successful * exec, the 'sched_process_exec' tracepoint will also fire. On failure, * ... [something else] */ > +TRACE_EVENT(new_exec, > + > + TP_PROTO(struct task_struct *task, struct linux_binprm *bprm), > + > + TP_ARGS(task, bprm), > + > + TP_STRUCT__entry( > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __string( comm, task->comm ) > + ), > + > + TP_fast_assign( > + __assign_str(filename, bprm->filename); What about binfmt_misc, and binfmt_script? You may want bprm->interp too? -Kees > + __entry->pid = task->pid; > + __assign_str(comm, task->comm); > + ), > + > + TP_printk("filename=%s pid=%d comm=%s", > + __get_str(filename), __entry->pid, __get_str(comm)) > +); > + > #endif > > /* This part must be outside protection */ > -- > 2.44.0.478.gd926399ef9-goog > -- Kees Cook