On Fri, Oct 31, 2014 at 12:19 PM, Aditya Kali <adityakali@xxxxxxxxxx> wrote: > This patch enables cgroup mounting inside userns when a process > as appropriate privileges. The cgroup filesystem mounted is > rooted at the cgroupns-root. Thus, in a container-setup, only > the hierarchy under the cgroupns-root is exposed inside the container. > This allows container management tools to run inside the containers > without depending on any global state. > In order to support this, a new kernfs api is added to lookup the > dentry for the cgroupns-root. > > Signed-off-by: Aditya Kali <adityakali@xxxxxxxxxx> > --- > fs/kernfs/mount.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/kernfs.h | 2 ++ > kernel/cgroup.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 95 insertions(+), 2 deletions(-) > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c > index f973ae9..e334f45 100644 > --- a/fs/kernfs/mount.c > +++ b/fs/kernfs/mount.c > @@ -62,6 +62,54 @@ struct kernfs_root *kernfs_root_from_sb(struct super_block *sb) > return NULL; > } > > +/** > + * kernfs_make_root - create new root dentry for the given kernfs_node. > + * @sb: the kernfs super_block > + * @kn: kernfs_node for which a dentry is needed > + * > + * This can used used by callers which want to mount only a part of the kernfs > + * as root of the filesystem. > + */ > +struct dentry *kernfs_obtain_root(struct super_block *sb, > + struct kernfs_node *kn) > +{ I can't usefully review this, but kernfs_make_root and kernfs_obtain_root aren't the same string... > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 7e5d597..250aaec 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -1302,6 +1302,13 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts) > > memset(opts, 0, sizeof(*opts)); > > + /* Implicitly add CGRP_ROOT_SANE_BEHAVIOR if inside a non-init cgroup > + * namespace. > + */ > + if (current->nsproxy->cgroup_ns != &init_cgroup_ns) { > + opts->flags |= CGRP_ROOT_SANE_BEHAVIOR; > + } > + I don't like this implicit stuff. Can you just return -EINVAL if sane behavior isn't requested? > while ((token = strsep(&o, ",")) != NULL) { > nr_opts++; > > @@ -1391,7 +1398,7 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts) > > if (opts->flags & CGRP_ROOT_SANE_BEHAVIOR) { > pr_warn("sane_behavior: this is still under development and its behaviors will change, proceed at your own risk\n"); > - if (nr_opts != 1) { > + if (nr_opts > 1) { > pr_err("sane_behavior: no other mount options allowed\n"); > return -EINVAL; This looks wrong. But, if you make the change above, then it'll be right. > @@ -1685,6 +1701,14 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, > int ret; > int i; > bool new_sb; > + struct cgroup_namespace *ns = > + get_cgroup_ns(current->nsproxy->cgroup_ns); > + > + /* Check if the caller has permission to mount. */ > + if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) { > + put_cgroup_ns(ns); > + return ERR_PTR(-EPERM); > + } Why is this necessary? > @@ -1862,6 +1904,7 @@ static struct file_system_type cgroup_fs_type = { > .name = "cgroup", > .mount = cgroup_mount, > .kill_sb = cgroup_kill_sb, > + .fs_flags = FS_USERNS_MOUNT, Aargh, another one! Eric, can you either ack or nack my patch? Because if my patch goes in, then this line may need to change. Or not, but if a stable release with cgroupfs and without my patch happens, then we'll have an ABI break. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html