Re: [PATCH v3 2/3] fs: introduce uid/gid shifting bind mount

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 18 Feb 2020 09:38:07 +0200

On Mon, Feb 17, 2020 at 10:58 PM James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> This implementation reverse shifts according to the user_ns belonging
> to the mnt_ns.  So if the vfsmount has the newly introduced flag
> MNT_SHIFT and the current user_ns is the same as the mount_ns->user_ns
> then we shift back using the user_ns and an optional mnt_userns (which
> belongs to the struct mount) before committing to the underlying
> filesystem.
>
> For example, if a user_ns is created where interior (fake root, uid 0)
> is mapped to kernel uid 100000 then writes from interior root normally
> go to the filesystem at the kernel uid.  However, if MNT_SHIFT is set,
> they will be shifted back to write at uid 0, meaning we can bind mount
> real image filesystems to user_ns protected faker root.
>
> In essence there are several things which have to be done for this to
> occur safely.  Firstly for all operations on the filesystem, new
> credentials have to be installed where fsuid and fsgid are set to the
> *interior* values. Next all inodes used from the filesystem have to
> have i_uid and i_gid shifted back to the kernel values and attributes
> set from user space have to have ia_uid and ia_gid shifted from the
> kernel values to the interior values.  The capability checks have to
> be done using ns_capable against the kernel values, but the inode
> capability checks have to be done against the shifted ids.
>
> Since creating a new credential is a reasonably expensive proposition
> and we have to shift and unshift many times during path walking, a
> cached copy of the shifted credential is saved to a newly created
> place in the task structure.  This serves the dual purpose of allowing
> us to use a pre-prepared copy of the shifted credentials and also
> allows us to recognise whenever the shift is actually in effect (the
> cached shifted credential pointer being equal to the current_cred()
> pointer).
>
> To get this all to work, we have a check for the vfsmount flag and the
> user_ns gating a shifting of the credentials over all user space
> entries to filesystem functions.  In theory the path has to be present
> everywhere we do this, so we can check the vfsmount flags.  However,
> for lower level functions we can cheat this path check of vfsmount
> simply to check whether a shifted credential is in effect or not to
> gate things like the inode permission check, which means the path
> doesn't have to be threaded all the way through the permission
> checking functions.  if the credential is shifted check passes, we can
> also be sure that the current user_ns is the same as the mnt->user_ns,
> so we can use it and thus have no need of the struct mount at the
> point of the shift.
>
> Although the shift can be effected simply by executing
> do_reconfigure_mnt with MNT_SHIFT in the flags, this patch only
> contains the shifting mechanisms.  The follow on patch wires up the
> user visible API for turning the flag on.
>
> Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
>
> ---
[...]

> @@ -3828,6 +3884,7 @@ long do_mknodat(int dfd, const char __user *filename, umode_t mode,
>         if (IS_ERR(dentry))
>                 return PTR_ERR(dentry);
>
> +       cred = change_userns_creds(&path);
>         if (!IS_POSIXACL(path.dentry->d_inode))
>                 mode &= ~current_umask();
>         error = security_path_mknod(&path, dentry, mode, dev);
[...]

> +       cred = change_userns_creds(&path);
>         if (!IS_POSIXACL(path.dentry->d_inode))
>                 mode &= ~current_umask();
>         error = security_path_mkdir(&path, dentry, mode);
[...]

> +       cred = change_userns_creds(&path);
>         error = security_path_symlink(&path, dentry, from->name);

I see a pattern above.

Perhaps change_userns_creds() should be inside security_path_XXX hooks?
Perhaps auto-shifting bind mount should be implemented by an LSM?
After, all "gating" access to filesystem, is part of what LSMs do and
uid (or fsid)
shifting is a sort of "gating".
Heck, there should already be a way to attach a security context to a mount,
right? So you don't even need a new UAPI in order to configure the auto-shifting
LSM. And you could use standard security.* xattr for persistent configuration
of the auto-shifting filesystem sections, which is something that you wanted
to solve anyway, right?

Apologies if my suggestions are flawed with misunderstanding of the feature.

Thanks,
Amir.