Re: [PATCH] nsfs: add pid translation ioctls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Mi., 19. Juni 2024 um 15:50 Uhr schrieb Christian Brauner
<brauner@xxxxxxxxxx>:
>
> Add ioctl()s to translate pids between pid namespaces.
>
> LXCFS is a tiny fuse filesystem used to virtualize various aspects of
> procfs. LXCFS is run on the host. The files and directories it creates
> can be bind-mounted by e.g. a container at startup and mounted over the
> various procfs files the container wishes to have virtualized. When e.g.
> a read request for uptime is received, LXCFS will receive the pid of the
> reader. In order to virtualize the corresponding read, LXCFS needs to
> know the pid of the init process of the reader's pid namespace. In order
> to do this, LXCFS first needs to fork() two helper processes. The first
> helper process setns() to the readers pid namespace. The second helper
> process is needed to create a process that is a proper member of the pid
> namespace. The second helper process then creates a ucred message with
> ucred.pid set to 1 and sends it back to LXCFS. The kernel will translate
> the ucred.pid field to the corresponding pid number in LXCFS's pid
> namespace. This way LXCFS can learn the init pid number of the reader's
> pid namespace and can go on to virtualize. Since these two forks() are
> costly LXCFS maintains an init pid cache that caches a given pid for a
> fixed amount of time. The cache is pruned during new read requests.
> However, even with the cache the hit of the two forks() is singificant
> when a very large number of containers are running. With this simple
> patch we add an ns ioctl that let's a caller retrieve the init pid nr of
> a pid namespace through its pid namespace fd. This significantly
> improves performance with a very simple change.
>
> Support translation of pids and tgids. Other concepts can be added but
> there are no obvious users for this right now.
>
> To protect against races pidfds can be used to check whether the process
> is still valid. If needed, this can also be extended to work on pidfds
> directly.
>
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>

Dear Christian,

This is an amazing idea! Thanks for implementing and posting this!

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@xxxxxxxxxxxxx>

> ---
> ---
>  fs/nsfs.c                 | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/nsfs.h |  8 ++++++++
>  2 files changed, 55 insertions(+)
>
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index 07e22a15ef02..4a4d7b1eb38c 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -8,9 +8,11 @@
>  #include <linux/magic.h>
>  #include <linux/ktime.h>
>  #include <linux/seq_file.h>
> +#include <linux/pid_namespace.h>
>  #include <linux/user_namespace.h>
>  #include <linux/nsfs.h>
>  #include <linux/uaccess.h>
> +#include <linux/cleanup.h>
>
>  #include "internal.h"
>
> @@ -123,9 +125,12 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
>                         unsigned long arg)
>  {
>         struct user_namespace *user_ns;
> +       struct pid_namespace *pid_ns;
> +       struct task_struct *tsk;
>         struct ns_common *ns = get_proc_ns(file_inode(filp));
>         uid_t __user *argp;
>         uid_t uid;
> +       pid_t pid_nr;
>
>         switch (ioctl) {
>         case NS_GET_USERNS:
> @@ -143,6 +148,48 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
>                 argp = (uid_t __user *) arg;
>                 uid = from_kuid_munged(current_user_ns(), user_ns->owner);
>                 return put_user(uid, argp);
> +       case NS_GET_PID_FROM_PIDNS:
> +               fallthrough;
> +       case NS_GET_TGID_FROM_PIDNS:
> +               fallthrough;
> +       case NS_GET_PID_IN_PIDNS:
> +               fallthrough;
> +       case NS_GET_TGID_IN_PIDNS:
> +               if (ns->ops->type != CLONE_NEWPID)
> +                       return -EINVAL;
> +
> +               pid_ns = container_of(ns, struct pid_namespace, ns);
> +
> +               guard(rcu)();
> +               if (ioctl == NS_GET_PID_IN_PIDNS ||
> +                   ioctl == NS_GET_TGID_IN_PIDNS)
> +                       tsk = find_task_by_vpid(arg);
> +               else
> +                       tsk = find_task_by_pid_ns(arg, pid_ns);
> +               if (!tsk)
> +                       return -ESRCH;
> +
> +               switch (ioctl) {
> +               case NS_GET_PID_FROM_PIDNS:
> +                       pid_nr = task_pid_vnr(tsk);
> +                       break;
> +               case NS_GET_TGID_FROM_PIDNS:
> +                       pid_nr = task_tgid_vnr(tsk);
> +                       break;
> +               case NS_GET_PID_IN_PIDNS:
> +                       pid_nr = task_pid_nr_ns(tsk, pid_ns);
> +                       break;
> +               case NS_GET_TGID_IN_PIDNS:
> +                       pid_nr = task_tgid_nr_ns(tsk, pid_ns);
> +                       break;
> +               default:
> +                       pid_nr = 0;
> +                       break;
> +               }
> +               if (!pid_nr)
> +                       return -ESRCH;
> +
> +               return pid_nr;
>         default:
>                 return -ENOTTY;
>         }
> diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
> index a0c8552b64ee..faeb9195da08 100644
> --- a/include/uapi/linux/nsfs.h
> +++ b/include/uapi/linux/nsfs.h
> @@ -15,5 +15,13 @@
>  #define NS_GET_NSTYPE          _IO(NSIO, 0x3)
>  /* Get owner UID (in the caller's user namespace) for a user namespace */
>  #define NS_GET_OWNER_UID       _IO(NSIO, 0x4)
> +/* Translate pid from target pid namespace into the caller's pid namespace. */
> +#define NS_GET_PID_FROM_PIDNS  _IOR(NSIO, 0x5, int)
> +/* Return thread-group leader id of pid in the callers pid namespace. */
> +#define NS_GET_TGID_FROM_PIDNS _IOR(NSIO, 0x7, int)
> +/* Translate pid from caller's pid namespace into a target pid namespace. */
> +#define NS_GET_PID_IN_PIDNS    _IOR(NSIO, 0x6, int)
> +/* Return thread-group leader id of pid in the target pid namespace. */
> +#define NS_GET_TGID_IN_PIDNS   _IOR(NSIO, 0x8, int)
>
>  #endif /* __LINUX_NSFS_H */
>
> ---
> base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
> change-id: 20240619-work-ns_ioctl-447979cf0820
>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux