On Wed, Sep 25, 2024 at 05:50:10PM +0200, Aleksa Sarai wrote: > On 2024-09-24, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > Tycho Andersen <tycho@tycho.pizza> writes: > > > > > From: Tycho Andersen <tandersen@xxxxxxxxxxx> > > > > > > Zbigniew mentioned at Linux Plumber's that systemd is interested in > > > switching to execveat() for service execution, but can't, because the > > > contents of /proc/pid/comm are the file descriptor which was used, > > > instead of the path to the binary. This makes the output of tools like > > > top and ps useless, especially in a world where most fds are opened > > > CLOEXEC so the number is truly meaningless. > > > > > > This patch adds an AT_ flag to fix up /proc/pid/comm to instead be the > > > contents of argv[0], instead of the fdno. > > > > The kernel allows prctl(PR_SET_NAME, ...) without any permission > > checks so adding an AT_ flat to use argv[0] instead of the execed > > filename seems reasonable. > > > > Maybe the flag should be called AT_NAME_ARGV0. > > > > > > That said I am trying to remember why we picked /dev/fd/N, as the > > filename. > > > > My memory is that we couldn't think of anything more reasonable to use. > > Looking at commit 51f39a1f0cea ("syscalls: implement execveat() system > > call") unfortunately doesn't clarify anything for me, except that > > /dev/fd/N was a reasonable choice. > > > > I am thinking the code could reasonably try: > > get_fs_root_rcu(current->fs, &root); > > path = __d_path(file->f_path, root, buf, buflen); > > > > To see if a path to the file from the current root directory can be > > found. For files that are not reachable from the current root the code > > still need to fallback to /dev/fd/N. > > > > Do you think you can investigate that and see if that would generate > > a reasonable task->comm? > > The problem mentioned during the discussion after the talk was that > busybox symlinks everything to the same program, so using d_path will > give somewhat confusing results and so separate behaviour is still > needed (though to be fair, the current results are also confusing). I also remember that busybox used to do symlinks, but I just looked the latest version on the docker hub (perhaps not representative...) and it's all hard links, which works fine with the __d_path() trick. > > It looks like a reasonable case can be made that while /dev/fd/N is > > a good path for interpreters, it is never a good choice for comm, > > so perhaps we could always use argv[0] if the fdpath is of the > > form /dev/fd/N. > > > > All of that said I am not a fan of the implementation below as it has > > the side effect of replacing /dev/fd/N with a filename that is not > > usable by #! interpreters. So I suggest an implementation that affects > > task->comm and not brpm->filename. > > I think only affecting task->comm would be ideal. Yep, I did this for the test above, and it worked fine: if (bprm->fdpath) { /* * If fdpath was set, execveat() made up a path that will * probably not be useful to admins running ps or similar. * Let's fix it up to be something reasonable. */ struct path root; char *path, buf[1024]; get_fs_root(current->fs, &root); path = __d_path(&bprm->file->f_path, &root, buf, sizeof(buf)); __set_task_comm(me, kbasename(path), true); } else { __set_task_comm(me, kbasename(bprm->filename), true); } obviously we don't want a stack allocated buffer, but triggering on ->fdpath != NULL seems like the right thing, so we won't need a flag either. The question is: argv[0] or __d_path()? Tycho