On Wed, Sep 16, 2015 at 1:02 PM, Seth Forshee <seth.forshee@xxxxxxxxxxxxx> wrote: > From: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > > If a process gets access to a mount from a different user > namespace, that process should not be able to take advantage of > setuid files or selinux entrypoints from that filesystem. Prevent > this by treating mounts from other mount namespaces and those not > owned by current_user_ns() or an ancestor as nosuid. > > This will make it safer to allow more complex filesystems to be > mounted in non-root user namespaces. > > This does not remove the need for MNT_LOCK_NOSUID. The setuid, > setgid, and file capability bits can no longer be abused if code in > a user namespace were to clear nosuid on an untrusted filesystem, > but this patch, by itself, is insufficient to protect the system > from abuse of files that, when execed, would increase MAC privilege. > > As a more concrete explanation, any task that can manipulate a > vfsmount associated with a given user namespace already has > capabilities in that namespace and all of its descendents. If they > can cause a malicious setuid, setgid, or file-caps executable to > appear in that mount, then that executable will only allow them to > elevate privileges in exactly the set of namespaces in which they > are already privileges. > > On the other hand, if they can cause a malicious executable to > appear with a dangerous MAC label, running it could change the > caller's security context in a way that should not have been > possible, even inside the namespace in which the task is confined. > > As a hardening measure, this would have made CVE-2014-5207 much > more difficult to exploit. > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > Signed-off-by: Seth Forshee <seth.forshee@xxxxxxxxxxxxx> > --- > fs/exec.c | 2 +- > fs/namespace.c | 13 +++++++++++++ > include/linux/mount.h | 1 + > security/commoncap.c | 2 +- > security/selinux/hooks.c | 2 +- > 5 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index b06623a9347f..ea7311d72cc3 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm) > bprm->cred->euid = current_euid(); > bprm->cred->egid = current_egid(); > > - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) > + if (!mnt_may_suid(bprm->file->f_path.mnt)) > return; > > if (task_no_new_privs(current)) > diff --git a/fs/namespace.c b/fs/namespace.c > index da70f7c4ece1..2101ce7b96ab 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -3276,6 +3276,19 @@ found: > return visible; > } > > +bool mnt_may_suid(struct vfsmount *mnt) > +{ > + /* > + * Foreign mounts (accessed via fchdir or through /proc > + * symlinks) are always treated as if they are nosuid. This > + * prevents namespaces from trusting potentially unsafe > + * suid/sgid bits, file caps, or security labels that originate > + * in other namespaces. > + */ > + return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) && > + in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns); Is check_mnt correct here? If I read it correctly, this means that, if I just unshare my userns and do nothing else (and, in particular, don't unshare my mount namespace), then everything will have mnt_may_suid return false. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html