On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote: > Zbigniew mentioned at Linux Plumber's that systemd is interested in > switching to execveat() for service execution, but can't, because the > contents of /proc/pid/comm are the file descriptor which was used, > instead of the path to the binary. This makes the output of tools like > top and ps useless, especially in a world where most fds are opened > CLOEXEC so the number is truly meaningless. > > When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the > dentry's filename for "comm" instead of using the useless numeral from > the synthetic fdpath construction. This way the actual exec machinery > is unchanged, but cosmetically the comm looks reasonable to admins > investigating things. > > Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused > flag bits to indicate that we need to set "comm" from the dentry. > > Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@xxxxxxxxx> > Suggested-by: Tycho Andersen <tandersen@xxxxxxxxxxx> > Suggested-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > CC: Aleksa Sarai <cyphar@xxxxxxxxxx> > Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec > Signed-off-by: Kees Cook <kees@xxxxxxxxxx> > --- > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx> > Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Christian Brauner <brauner@xxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Cc: linux-mm@xxxxxxxxx > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > > Here's what I've put together from the various suggestions. I didn't > want to needlessly grow bprm, so I just added a flag instead. Otherwise, > this is very similar to what Linus and Al suggested. > --- > fs/exec.c | 22 +++++++++++++++++++--- > include/linux/binfmts.h | 4 +++- > 2 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 5f16500ac325..d897d60ca5c2 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm) > set_dumpable(current->mm, SUID_DUMP_USER); > > perf_event_exec(); > - __set_task_comm(me, kbasename(bprm->filename), true); > + > + /* > + * If the original filename was empty, alloc_bprm() made up a path > + * that will probably not be useful to admins running ps or similar. > + * Let's fix it up to be something reasonable. > + */ > + if (bprm->comm_from_dentry) { > + rcu_read_lock(); > + /* The dentry name won't change while we hold the rcu read lock. */ > + __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name), > + true); This does not sound legit whatsoever as it would indicate all renames wait for rcu grace periods to end, which would be prettye weird. Even commentary above dentry_cmp states: * Be careful about RCU walk racing with rename: * use 'READ_ONCE' to fetch the name pointer. * * NOTE! Even if a rename will mean that the length * was not loaded atomically, we don't care. It may be this is considered tolerable, but there should be no difficulty getting a real name there? Regardless, the comment looks bogus.