Re: [PATCH v9] pidfd: add ioctl to retrieve pid info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-10-08, luca.boccassi@xxxxxxxxx <luca.boccassi@xxxxxxxxx> wrote:
> From: Luca Boccassi <luca.boccassi@xxxxxxxxx>
> 
> A common pattern when using pid fds is having to get information
> about the process, which currently requires /proc being mounted,
> resolving the fd to a pid, and then do manual string parsing of
> /proc/N/status and friends. This needs to be reimplemented over
> and over in all userspace projects (e.g.: I have reimplemented
> resolving in systemd, dbus, dbus-daemon, polkit so far), and
> requires additional care in checking that the fd is still valid
> after having parsed the data, to avoid races.
> 
> Having a programmatic API that can be used directly removes all
> these requirements, including having /proc mounted.
> 
> As discussed at LPC24, add an ioctl with an extensible struct
> so that more parameters can be added later if needed. Start with
> returning pid/tgid/ppid and creds unconditionally, and cgroupid
> optionally.
> 
> Signed-off-by: Luca Boccassi <luca.boccassi@xxxxxxxxx>
> ---
> v9: drop result_mask and reuse request_mask instead
> v8: use RAII guard for rcu, call put_cred()
> v7: fix RCU issue and style issue introduced by v6 found by reviewer
> v6: use rcu_read_lock() when fetching cgroupid, use task_ppid_nr_ns() to
>     get the ppid, return ESCHR if any of pid/tgid/ppid are 0 at the end
>     of the call to avoid providing incomplete data, document what the
>     callers should expect
> v5: check again that the task hasn't exited immediately before copying
>     the result out to userspace, to ensure we are not returning stale data
>     add an ifdef around the cgroup structs usage to fix build errors when
>     the feature is disabled
> v4: fix arg check in pidfd_ioctl() by moving it after the new call
> v3: switch from pid_vnr() to task_pid_vnr()
> v2: Apply comments from Christian, apart from the one about pid namespaces
>     as I need additional hints on how to implement it.
>     Drop the security_context string as it is not the appropriate
>     metadata to give userspace these days.
> 
>  fs/pidfs.c                                    | 88 ++++++++++++++++++-
>  include/uapi/linux/pidfd.h                    | 30 +++++++
>  .../testing/selftests/pidfd/pidfd_open_test.c | 80 ++++++++++++++++-
>  3 files changed, 194 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index 80675b6bf884..15cdc7fe4968 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -2,6 +2,7 @@
>  #include <linux/anon_inodes.h>
>  #include <linux/file.h>
>  #include <linux/fs.h>
> +#include <linux/cgroup.h>
>  #include <linux/magic.h>
>  #include <linux/mount.h>
>  #include <linux/pid.h>
> @@ -114,6 +115,83 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
>  	return poll_flags;
>  }
>  
> +static long pidfd_info(struct task_struct *task, unsigned int cmd, unsigned long arg)
> +{
> +	struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
> +	size_t usize = _IOC_SIZE(cmd);
> +	struct pidfd_info kinfo = {};
> +	struct user_namespace *user_ns;
> +	const struct cred *c;
> +	__u64 request_mask;
> +
> +	if (!uinfo)
> +		return -EINVAL;
> +	if (usize < sizeof(struct pidfd_info))
> +		return -EINVAL; /* First version, no smaller struct possible */
> +
> +	if (copy_from_user(&request_mask, &uinfo->request_mask, sizeof(request_mask)))
> +		return -EFAULT;
> +
> +	c = get_task_cred(task);
> +	if (!c)
> +		return -ESRCH;
> +
> +	/* Unconditionally return identifiers and credentials, the rest only on request */
> +
> +	user_ns = current_user_ns();
> +	kinfo.ruid = from_kuid_munged(user_ns, c->uid);
> +	kinfo.rgid = from_kgid_munged(user_ns, c->gid);
> +	kinfo.euid = from_kuid_munged(user_ns, c->euid);
> +	kinfo.egid = from_kgid_munged(user_ns, c->egid);
> +	kinfo.suid = from_kuid_munged(user_ns, c->suid);
> +	kinfo.sgid = from_kgid_munged(user_ns, c->sgid);
> +	kinfo.fsuid = from_kuid_munged(user_ns, c->fsuid);
> +	kinfo.fsgid = from_kgid_munged(user_ns, c->fsgid);
> +	put_cred(c);
> +
> +#ifdef CONFIG_CGROUPS
> +	if (request_mask & PIDFD_INFO_CGROUPID) {
> +		struct cgroup *cgrp;
> +
> +		guard(rcu)();
> +		cgrp = task_cgroup(task, pids_cgrp_id);
> +		if (!cgrp)
> +			return -ENODEV;
> +		kinfo.cgroupid = cgroup_id(cgrp);
> +
> +		kinfo.request_mask |= PIDFD_INFO_CGROUPID;
> +	}
> +#endif
> +
> +	/*
> +	 * Copy pid/tgid last, to reduce the chances the information might be
> +	 * stale. Note that it is not possible to ensure it will be valid as the
> +	 * task might return as soon as the copy_to_user finishes, but that's ok
> +	 * and userspace expects that might happen and can act accordingly, so
> +	 * this is just best-effort. What we can do however is checking that all
> +	 * the fields are set correctly, or return ESRCH to avoid providing
> +	 * incomplete information. */
> +
> +	kinfo.ppid = task_ppid_nr_ns(task, NULL);
> +	kinfo.tgid = task_tgid_vnr(task);
> +	kinfo.pid = task_pid_vnr(task);
> +
> +	if (kinfo.pid == 0 || kinfo.tgid == 0 || (kinfo.ppid == 0 && kinfo.pid != 1))
> +		return -ESRCH;
> +
> +	/*
> +	 * If userspace and the kernel have the same struct size it can just
> +	 * be copied. If userspace provides an older struct, only the bits that
> +	 * userspace knows about will be copied. If userspace provides a new
> +	 * struct, only the bits that the kernel knows about will be copied and
> +	 * the size value will be set to the size the kernel knows about.
> +	 */
> +	if (copy_to_user(uinfo, &kinfo, min(usize, sizeof(kinfo))))
> +		return -EFAULT;

If usize > ksize, we also want to clear_user() the trailing bytes to
avoid userspace thinking that any garbage bytes they had are valid.

Also, you mention "the size value" but there is no size in pidfd_info. I
don't think it's actually necessary to include such a field (especially
when you have a statx-like request_mask), but it means you really should
clear the trailing bytes to avoid userspace bugs.

I implemented all of these semantics as copy_struct_to_user() in the
CHECK_FIELDS patch I sent a few weeks ago (I just sent v3[1]). Maybe you
can cherry-pick this patch and use it? The semantics when we extend this
pidfd_info to accept new request_mask values with larger structures is
going to get a little ugly and copy_struct_to_user() makes this a little
easier to deal with.

[1]: https://lore.kernel.org/all/20241010-extensible-structs-check_fields-v3-1-d2833dfe6edd@xxxxxxxxxx/

> +
> +	return 0;
> +}
> +
>  static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  {
>  	struct task_struct *task __free(put_task) = NULL;
> @@ -122,13 +200,17 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  	struct ns_common *ns_common = NULL;
>  	struct pid_namespace *pid_ns;
>  
> -	if (arg)
> -		return -EINVAL;
> -
>  	task = get_pid_task(pid, PIDTYPE_PID);
>  	if (!task)
>  		return -ESRCH;
>  
> +	/* Extensible IOCTL that does not open namespace FDs, take a shortcut */
> +	if (_IOC_NR(cmd) == _IOC_NR(PIDFD_GET_INFO))
> +		return pidfd_info(task, cmd, arg);
> +
> +	if (arg)
> +		return -EINVAL;
> +
>  	scoped_guard(task_lock, task) {
>  		nsp = task->nsproxy;
>  		if (nsp)
> diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> index 565fc0629fff..d685eeeedc51 100644
> --- a/include/uapi/linux/pidfd.h
> +++ b/include/uapi/linux/pidfd.h
> @@ -16,6 +16,35 @@
>  #define PIDFD_SIGNAL_THREAD_GROUP	(1UL << 1)
>  #define PIDFD_SIGNAL_PROCESS_GROUP	(1UL << 2)
>  
> +/* Flags for pidfd_info. */
> +#define PIDFD_INFO_CGROUPID		(1UL << 0)

While it isn't strictly necessary, maybe we should provide some
always-set bits like statx does? While they would always be set, it
might incentivise programs to write code that checks if the request_mask
bits are set after the ioctl(2) returns from the outset. Then again,
PIDFD_INFO_CGROUPID is probably enough to justify writing code correctly
from the outset.

> +
> +struct pidfd_info {
> +	/* Let userspace request expensive stuff explictly, and let the kernel
> +	 * indicate whether it knows about it. */

I would prefer a slightly more informative comment (which mentions that
this will also be used for extensibility), something like:


/*
 * This mask is similar to the request_mask in statx(2).
 *
 * Userspace indicates what extensions or expensive-to-calculate fields
 * they want by setting the corresponding bits in request_mask.
 *
 * When filling the structure, the kernel will only set bits
 * corresponding to the fields that were actually filled by the kernel.
 * This also includes any future extensions that might be automatically
 * filled. If the structure size is too small to contain a field
 * (requested or not), to avoid confusion the request_mask will not
 * contain a bit for that field.
 *
 * As such, userspace MUST verify that request_mask contains the
 * corresponding flags after the ioctl(2) returns to ensure that it is
 * using valid data.
 */


The bit about request_mask not containing a bit if the structure size is
too small is one of the things that copy_struct_to_user() helps with
implementing (see the patch for some more docs).

> +	__u64 request_mask;
> +	/*
> +	 * The information contained in the following fields might be stale at the
> +	 * time it is received, as the target process might have exited as soon as
> +	 * the IOCTL was processed, and there is no way to avoid that. However, it
> +	 * is guaranteed that if the call was successful, then the information was
> +	 * correct and referred to the intended process at the time the work was
> +	 * performed. */
> +	__u64 cgroupid;
> +	__u32 pid;
> +	__u32 tgid;
> +	__u32 ppid;
> +	__u32 ruid;
> +	__u32 rgid;
> +	__u32 euid;
> +	__u32 egid;
> +	__u32 suid;
> +	__u32 sgid;
> +	__u32 fsuid;
> +	__u32 fsgid;
> +	__u32 spare0[1];
> +};
> +
>  #define PIDFS_IOCTL_MAGIC 0xFF
>  
>  #define PIDFD_GET_CGROUP_NAMESPACE            _IO(PIDFS_IOCTL_MAGIC, 1)
> @@ -28,5 +57,6 @@
>  #define PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 8)
>  #define PIDFD_GET_USER_NAMESPACE              _IO(PIDFS_IOCTL_MAGIC, 9)
>  #define PIDFD_GET_UTS_NAMESPACE               _IO(PIDFS_IOCTL_MAGIC, 10)
> +#define PIDFD_GET_INFO                        _IOWR(PIDFS_IOCTL_MAGIC, 11, struct pidfd_info)
>  
>  #endif /* _UAPI_LINUX_PIDFD_H */
> diff --git a/tools/testing/selftests/pidfd/pidfd_open_test.c b/tools/testing/selftests/pidfd/pidfd_open_test.c
> index c62564c264b1..b2a8cfb19a74 100644
> --- a/tools/testing/selftests/pidfd/pidfd_open_test.c
> +++ b/tools/testing/selftests/pidfd/pidfd_open_test.c
> @@ -13,6 +13,7 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <syscall.h>
> +#include <sys/ioctl.h>
>  #include <sys/mount.h>
>  #include <sys/prctl.h>
>  #include <sys/wait.h>
> @@ -21,6 +22,34 @@
>  #include "pidfd.h"
>  #include "../kselftest.h"
>  
> +#ifndef PIDFS_IOCTL_MAGIC
> +#define PIDFS_IOCTL_MAGIC 0xFF
> +#endif
> +
> +#ifndef PIDFD_GET_INFO
> +#define PIDFD_GET_INFO _IOWR(PIDFS_IOCTL_MAGIC, 11, struct pidfd_info)
> +#define PIDFD_INFO_CGROUPID		(1UL << 0)
> +
> +struct pidfd_info {
> +	/* Let userspace request expensive stuff explictly, and let the kernel
> +	 * indicate whether it knows about it. */
> +	__u64 request_mask;
> +	__u64 cgroupid;
> +	__u32 pid;
> +	__u32 tgid;
> +	__u32 ppid;
> +	__u32 ruid;
> +	__u32 rgid;
> +	__u32 euid;
> +	__u32 egid;
> +	__u32 suid;
> +	__u32 sgid;
> +	__u32 fsuid;
> +	__u32 fsgid;
> +	__u32 spare0[1];
> +};
> +#endif
> +
>  static int safe_int(const char *numstr, int *converted)
>  {
>  	char *err = NULL;
> @@ -120,10 +149,13 @@ static pid_t get_pid_from_fdinfo_file(int pidfd, const char *key, size_t keylen)
>  
>  int main(int argc, char **argv)
>  {
> +	struct pidfd_info info = {
> +		.request_mask = PIDFD_INFO_CGROUPID,
> +	};
>  	int pidfd = -1, ret = 1;
>  	pid_t pid;
>  
> -	ksft_set_plan(3);
> +	ksft_set_plan(4);
>  
>  	pidfd = sys_pidfd_open(-1, 0);
>  	if (pidfd >= 0) {
> @@ -153,6 +185,52 @@ int main(int argc, char **argv)
>  	pid = get_pid_from_fdinfo_file(pidfd, "Pid:", sizeof("Pid:") - 1);
>  	ksft_print_msg("pidfd %d refers to process with pid %d\n", pidfd, pid);
>  
> +	if (ioctl(pidfd, PIDFD_GET_INFO, &info) < 0) {
> +		ksft_print_msg("%s - failed to get info from pidfd\n", strerror(errno));
> +		goto on_error;
> +	}
> +	if (info.pid != pid) {
> +		ksft_print_msg("pid from fdinfo file %d does not match pid from ioctl %d\n",
> +			       pid, info.pid);
> +		goto on_error;
> +	}
> +	if (info.ppid != getppid()) {
> +		ksft_print_msg("ppid %d does not match ppid from ioctl %d\n",
> +			       pid, info.pid);
> +		goto on_error;
> +	}
> +	if (info.ruid != getuid()) {
> +		ksft_print_msg("uid %d does not match uid from ioctl %d\n",
> +			       getuid(), info.ruid);
> +		goto on_error;
> +	}
> +	if (info.rgid != getgid()) {
> +		ksft_print_msg("gid %d does not match gid from ioctl %d\n",
> +			       getgid(), info.rgid);
> +		goto on_error;
> +	}
> +	if (info.euid != geteuid()) {
> +		ksft_print_msg("euid %d does not match euid from ioctl %d\n",
> +			       geteuid(), info.euid);
> +		goto on_error;
> +	}
> +	if (info.egid != getegid()) {
> +		ksft_print_msg("egid %d does not match egid from ioctl %d\n",
> +			       getegid(), info.egid);
> +		goto on_error;
> +	}
> +	if (info.suid != geteuid()) {
> +		ksft_print_msg("suid %d does not match suid from ioctl %d\n",
> +			       geteuid(), info.suid);
> +		goto on_error;
> +	}
> +	if (info.sgid != getegid()) {
> +		ksft_print_msg("sgid %d does not match sgid from ioctl %d\n",
> +			       getegid(), info.sgid);
> +		goto on_error;
> +	}
> +	ksft_test_result_pass("get info from pidfd test: passed\n");
> +
>  	ret = 0;
>  
>  on_error:
> 
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
> -- 
> 2.45.2
> 
> 

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux