So far, unless the filesystem explicitly marks it (and most don't), processes running in user namespaces won't be allowed to access any devices. Although this makes sense, this is a quite restrictive rule, since a lot of those accesses would be perfectly safe: aside from the simple char devices in /dev/ like null, zero, etc, it is perfectly possible to assign a device for usage inside a namespace if we can establish trust in that operation. We will do that by marking the mount as MNT_NODEV_NS instead of MNT_NODEV. This is because if the mount operation explicitly asked for nodev, we ought to respect it. MNT_NODEV_NS will forbid accesses if the task is not on a device cgroup. If it is, we will rely on the control rules in devcg to intermediate the access an tell us what those tasks can or cannot do. There is precedence for that with memcg: although we don't explicitly test it like I am doing it here, we are allowing tmpfs mounts to happen in user namespaces because memcg will contain them. Signed-off-by: Glauber Costa <glommer@xxxxxxxxxxxxx> Cc: Aristeu Rozanski <aris@xxxxxxxxxx> Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx> Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx> --- fs/namei.c | 4 ++++ fs/namespace.c | 2 +- include/linux/mount.h | 2 ++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 57ae9c8..8a34d79 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2356,6 +2356,10 @@ static int may_open(struct path *path, int acc_mode, int flag) case S_IFCHR: if (path->mnt->mnt_flags & MNT_NODEV) return -EACCES; + + if ((path->mnt->mnt_flags & MNT_NODEV_NS) && + !task_in_child_devcgroup(current)) + return -EACCES; /*FALLTHRU*/ case S_IFIFO: case S_IFSOCK: diff --git a/fs/namespace.c b/fs/namespace.c index 50ca17d..fe8127e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1935,7 +1935,7 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, */ if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { flags |= MS_NODEV; - mnt_flags |= MNT_NODEV; + mnt_flags |= MNT_NODEV_NS; } } diff --git a/include/linux/mount.h b/include/linux/mount.h index d7029f4..8d190e4 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -32,6 +32,8 @@ struct mnt_namespace; #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 +#define MNT_NODEV_NS 0x400 /* userns mount, and nodev not explicit */ + #define MNT_SHARED 0x1000 /* if the vfsmount is a shared mount */ #define MNT_UNBINDABLE 0x2000 /* if the vfsmount is a unbindable mount */ /* -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html