Use the new custom ->permission hook to allow unprivileged processes to mkdir new sub-cgroup directories of the root_cset of their current cgroup namespace. No process outside of the cgroup namespace (or in a sub-namespace) has this ability, and thus a process must have sufficient privileges to setns to a cgroup namespace in order to create cgroups in a cgroup they are not currently residing in. Only privileged processes in the user namespace pinned to the cgroup namespace have this new ability. This further restricts any oddness from happening with the creation of many cgroups which the process cannot effectively join. This change only applies to the default hierarchy, as cgroupv1 cgroups are not necessarily hierarchical (thus allowing the creating of new sub-cgroups would allow for circumvention of cgroup limits). However, since cgroupv2 cgroups are strictly hierarchical as a design constraint this is possible. It should be noted that cgroupv2 also has attaching restrictions that make this process safe against two complicit processes from migrating a process to the less restrictive cgroup of the two. Cc: dev@xxxxxxxxxxxxxxxxxx Signed-off-by: Aleksa Sarai <asarai@xxxxxxx> --- kernel/cgroup.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 8647f3112f5c..4559baa7eabd 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -5490,6 +5490,67 @@ static int cgroup_rmdir(struct kernfs_node *kn) return ret; } +/* + * We have specific rules when deciding if a process can write to a cgroup + * directory, based on their current state inside cgroupns. + */ +static int cgroup_permission(struct inode *inode, struct kernfs_node *kn, + int mask) +{ + int ret; + struct cgroup *cgroup; + struct cgroup_namespace *cgroupns; + + /* + * First, compute the generic_permission return value. In most cases + * this will succeed and we can also avoid duplicating this code. + */ + + cgroup = kn->priv; + cgroup_get(cgroup); + + /* First, try the generic method which should work in most cases. */ + ret = generic_permission(inode, mask); + + /* If the generic check succeeded, then we're all good. */ + if (!ret) + goto out_put_cgroup; + + /* We're only interested in cgroup directories. */ + if (kernfs_type(kn) != KERNFS_DIR) + goto out_put_cgroup; + + /* ... and in may_create() operations only. */ + if ((mask & (MAY_WRITE | MAY_EXEC)) != (MAY_WRITE | MAY_EXEC)) + goto out_put_cgroup; + + /* + * This only applies for cgroups on the default hierarchy, as cgroupv1 + * was not truly hierarchical this operation was not safe. + */ + if (!cgroup_on_dfl(cgroup)) + goto out_put_cgroup; + + cgroupns = current->nsproxy->cgroup_ns; + get_cgroup_ns(cgroupns); + + ret = -EPERM; + if (cgroupns->root_cset->dfl_cgrp == cgroup) { + /* + * Check CAP_SYS_ADMIN, to make sure that unprivileged + * processes inside a cgroup namespace they don't "own" don't + * get any special treatment. + */ + if (ns_capable(cgroupns->user_ns, CAP_SYS_ADMIN)) + ret = 0; + } + + put_cgroup_ns(cgroupns); +out_put_cgroup: + cgroup_put(cgroup); + return ret; +} + static struct kernfs_syscall_ops cgroup_kf_syscall_ops = { .remount_fs = cgroup_remount, .show_options = cgroup_show_options, @@ -5497,6 +5558,7 @@ static struct kernfs_syscall_ops cgroup_kf_syscall_ops = { .rmdir = cgroup_rmdir, .rename = cgroup_rename, .show_path = cgroup_show_path, + .permission = cgroup_permission, }; static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html