On Thu, 11 Apr 2024 12:20:57 +0200 Marco Elver <elver@xxxxxxxxxx> wrote: > Add "sched_prepare_exec" tracepoint, which is run right after the point > of no return but before the current task assumes its new exec identity. > > Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec" > tracepoint runs before flushing the old exec, i.e. while the task still > has the original state (such as original MM), but when the new exec > either succeeds or crashes (but never returns to the original exec). > > Being able to trace this event can be helpful in a number of use cases: > > * allowing tracing eBPF programs access to the original MM on exec, > before current->mm is replaced; > * counting exec in the original task (via perf event); > * profiling flush time ("sched_prepare_exec" to "sched_process_exec"). > > Example of tracing output: > > $ cat /sys/kernel/debug/tracing/trace_pipe > <...>-379 [003] ..... 179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd > <...>-381 [002] ..... 180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd > <...>-385 [001] ..... 180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash > <...>-389 [006] ..... 192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx> Looks good to me. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> Thanks, > --- > v2: > * Add more documentation. > * Also show bprm->interp in trace. > * Rename to sched_prepare_exec. > --- > fs/exec.c | 8 ++++++++ > include/trace/events/sched.h | 35 +++++++++++++++++++++++++++++++++++ > 2 files changed, 43 insertions(+) > > diff --git a/fs/exec.c b/fs/exec.c > index 38bf71cbdf5e..57fee729dd92 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1268,6 +1268,14 @@ int begin_new_exec(struct linux_binprm * bprm) > if (retval) > return retval; > > + /* > + * This tracepoint marks the point before flushing the old exec where > + * the current task is still unchanged, but errors are fatal (point of > + * no return). The later "sched_process_exec" tracepoint is called after > + * the current task has successfully switched to the new exec. > + */ > + trace_sched_prepare_exec(current, bprm); > + > /* > * Ensure all future errors are fatal. > */ > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index dbb01b4b7451..226f47c6939c 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -420,6 +420,41 @@ TRACE_EVENT(sched_process_exec, > __entry->pid, __entry->old_pid) > ); > > +/** > + * sched_prepare_exec - called before setting up new exec > + * @task: pointer to the current task > + * @bprm: pointer to linux_binprm used for new exec > + * > + * Called before flushing the old exec, where @task is still unchanged, but at > + * the point of no return during switching to the new exec. At the point it is > + * called the exec will either succeed, or on failure terminate the task. Also > + * see the "sched_process_exec" tracepoint, which is called right after @task > + * has successfully switched to the new exec. > + */ > +TRACE_EVENT(sched_prepare_exec, > + > + TP_PROTO(struct task_struct *task, struct linux_binprm *bprm), > + > + TP_ARGS(task, bprm), > + > + TP_STRUCT__entry( > + __string( interp, bprm->interp ) > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __string( comm, task->comm ) > + ), > + > + TP_fast_assign( > + __assign_str(interp, bprm->interp); > + __assign_str(filename, bprm->filename); > + __entry->pid = task->pid; > + __assign_str(comm, task->comm); > + ), > + > + TP_printk("interp=%s filename=%s pid=%d comm=%s", > + __get_str(interp), __get_str(filename), > + __entry->pid, __get_str(comm)) > +); > > #ifdef CONFIG_SCHEDSTATS > #define DEFINE_EVENT_SCHEDSTAT DEFINE_EVENT > -- > 2.44.0.478.gd926399ef9-goog > -- Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>