Re: [PATCH v5 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 25, 2024 at 10:41 AM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
>
> It is useful to be able to utilise the pidfd mechanism to reference the
> current thread or process (from a userland point of view - thread group
> leader from the kernel's point of view).
>
> Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
>
> For convenience and to avoid confusion from userland's perspective we alias
> these:
>
> * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
>   the user will want to use, as they would find it surprising if for
>   instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
>   and that failed.
>
> * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
>   have no concept of thread groups or what a thread group leader is, and
>   from userland's perspective and nomenclature this is what userland
>   considers to be a process.
>
> Due to the refactoring of the central __pidfd_get_pid() function we can
> implement this functionality centrally, providing the use of this sentinel
> in most functionality which utilises pidfd's.
>
> We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> it wouldn't really make sense to use this there, we provide the ability for
> consistency).
>
> We explicitly disallow use of this in setns(), which would otherwise have
> required explicit custom handling, as it doesn't make sense to set the
> current calling thread to join the namespace of itself.
>
> As the callers of pidfd_get_pid() expect an increased reference count on
> the pid we do so in the self case, reducing churn and avoiding any breakage
> from existing logic which decrements this reference count.
>
> This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> pidfd_getfd() system calls.
>
> Things such as polling a pidfs and general fd operations are not supported,
> this strictly provides the sentinel for APIs which explicitly accept a
> pidfd.
>
> Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> ---
>  include/linux/pid.h        |  8 ++++--
>  include/uapi/linux/pidfd.h | 15 +++++++++++
>  kernel/exit.c              |  3 ++-
>  kernel/nsproxy.c           |  1 +
>  kernel/pid.c               | 51 ++++++++++++++++++++++++--------------
>  5 files changed, 57 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index d466890e1b35..3b2ac7567a88 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -78,11 +78,15 @@ struct file;
>   * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
>   *
>   * @pidfd:      The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> - *              @alloc_proc is also set.
> + *              @alloc_proc is also set, or PIDFD_SELF_* to refer to the current
> + *              thread or thread group leader.
>   * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
>   *              of a pidfd, and this will be used to determine the pid.
> +
>   * @flags:      Output variable, if non-NULL, then the file->f_flags of the
> - *              pidfd will be set here.
> + *              pidfd will be set here or If PIDFD_SELF_THREAD is set, this is
> + *              set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then
> + *              this is set to zero.
>   *
>   * Returns: If successful, the pid associated with the pidfd, otherwise an
>   *          error.
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 565fc0629fff..0ca2ebf906fd 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -29,4 +29,19 @@
>  #define PIDFD_GET_USER_NAMESPACE              _IO(PIDFS_IOCTL_MAGIC, 9)
>  #define PIDFD_GET_UTS_NAMESPACE               _IO(PIDFS_IOCTL_MAGIC, 10)
>
> +/*
> + * Special sentinel values which can be used to refer to the current thread or
> + * thread group leader (which from a userland perspective is the process).
> + */
> +#define PIDFD_SELF             PIDFD_SELF_THREAD
> +#define PIDFD_SELF_PROCESS     PIDFD_SELF_THREAD_GROUP
> +
> +#define PIDFD_SELF_THREAD      -100 /* Current thread. */

This conflicts with AT_FDCWD, might be worth changing?

> +#define PIDFD_SELF_THREAD_GROUP        -200 /* Current thread group leader. */

We might want to pick some range outside of the negative errno space
(-4096 IIRC), since we have plenty of values to pick from (2^31 at
least).

> +static inline int pidfd_is_self_sentinel(pid_t pid)
> +{
> +       return pid == PIDFD_SELF_THREAD || pid == PIDFD_SELF_THREAD_GROUP;
> +}

Do we want this in the uapi header? Even if this is useful, it might
come with several drawbacks such as breaking scripts that parse kernel
headers (and a quick git grep suggests we do have static inlines in
headers, but in rather obscure ones) and breaking C89:

<source>:8:8: error: unknown type name 'inline'
    8 | static inline int pidfd_is_self_sentinel(pid_t pid)

:)

> +
>  #endif /* _UAPI_LINUX_PIDFD_H */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 619f0014c33b..3eb20f8252ee 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -71,6 +71,7 @@
>  #include <linux/user_events.h>
>  #include <linux/uaccess.h>
>
> +#include <uapi/linux/pidfd.h>
>  #include <uapi/linux/wait.h>
>
>  #include <asm/unistd.h>
> @@ -1739,7 +1740,7 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid,
>                 break;
>         case P_PIDFD:
>                 type = PIDTYPE_PID;
> -               if (upid < 0)
> +               if (upid < 0 && !pidfd_is_self_sentinel(upid))
>                         return -EINVAL;
>
>                 pid = pidfd_get_pid(upid, &f_flags);
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index dc952c3b05af..d239f7eeaa1f 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -550,6 +550,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
>         struct nsset nsset = {};
>         int err = 0;
>
> +       /* If fd is PIDFD_SELF_*, implicitly fail here, as invalid. */
>         if (!fd_file(f))
>                 return -EBADF;
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 94c97559e5c5..8742157b36f8 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -535,33 +535,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
>  }
>  EXPORT_SYMBOL_GPL(find_ge_pid);
>
> +static struct pid *pidfd_get_pid_self(unsigned int pidfd, unsigned int *flags)
> +{
> +       bool is_thread = pidfd == PIDFD_SELF_THREAD;
> +       enum pid_type type = is_thread ? PIDTYPE_PID : PIDTYPE_TGID;
> +       struct pid *pid = *task_pid_ptr(current, type);
> +
> +       /* The caller expects an elevated reference count. */
> +       get_pid(pid);

It would be really really nice to avoid the get here, but I imagine
it'll take some refactoring around put_pid's?

> +       return pid;
> +}
> +
>  struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
>                             unsigned int *flags)
>  {
> -       struct pid *pid;
> -       struct fd f = fdget(pidfd);
> -       struct file *file = fd_file(f);
> +       if (pidfd_is_self_sentinel(pidfd)) {
> +               return pidfd_get_pid_self(pidfd, flags);
> +       } else {

Skipping the else here might make the rest of the code more legible
(since the sentinel branch returns anyway...).

-- 
Pedro





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux