[PATCH 2/4] fs: allow dev accesses in userns in controlled situations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So far, unless the filesystem explicitly marks it (and most don't),
processes running in user namespaces won't be allowed to access any
devices. Although this makes sense, this is a quite restrictive rule,
since a lot of those accesses would be perfectly safe: aside from the
simple char devices in /dev/ like null, zero, etc, it is perfectly
possible to assign a device for usage inside a namespace if we can
establish trust in that operation.

We will do that by marking the mount as MNT_NODEV_NS instead of
MNT_NODEV. This is because if the mount operation explicitly asked for
nodev, we ought to respect it. MNT_NODEV_NS will forbid accesses if the
task is not on a device cgroup. If it is, we will rely on the control
rules in devcg to intermediate the access an tell us what those tasks
can or cannot do.

There is precedence for that with memcg: although we don't explicitly
test it like I am doing it here, we are allowing tmpfs mounts to happen
in user namespaces because memcg will contain them.

Signed-off-by: Glauber Costa <glommer@xxxxxxxxxxxxx>
Cc: Aristeu Rozanski <aris@xxxxxxxxxx>
Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx>
Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
---
 fs/namei.c            | 4 ++++
 fs/namespace.c        | 2 +-
 include/linux/mount.h | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 57ae9c8..8a34d79 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2356,6 +2356,10 @@ static int may_open(struct path *path, int acc_mode, int flag)
 	case S_IFCHR:
 		if (path->mnt->mnt_flags & MNT_NODEV)
 			return -EACCES;
+
+		if ((path->mnt->mnt_flags & MNT_NODEV_NS) &&
+			!task_in_child_devcgroup(current))
+			return -EACCES;
 		/*FALLTHRU*/
 	case S_IFIFO:
 	case S_IFSOCK:
diff --git a/fs/namespace.c b/fs/namespace.c
index 50ca17d..fe8127e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1935,7 +1935,7 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 		 */
 		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
 			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV;
+			mnt_flags |= MNT_NODEV_NS;
 		}
 	}
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d7029f4..8d190e4 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -32,6 +32,8 @@ struct mnt_namespace;
 #define MNT_SHRINKABLE	0x100
 #define MNT_WRITE_HOLD	0x200
 
+#define MNT_NODEV_NS	0x400   /* userns mount, and nodev not explicit */
+
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */
 /*
-- 
1.8.1.2

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers


[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux