Hi Tejun, >> > Unfortunately, cgroup hierarchy isn't designed to support this sort >> > of automatic delegation. Unpriv processes would be able to escape >> > constraints on v1 with some controllers and on v2 controllers have to >> > be explicitly enabled by root for delegated scope to have access to >> > them. >> >> Not necessarily. We also talked about pinning the cgroup tree so that >> once you enter the cgroup namespace, your current cgroup directory >> becomes your root, meaning you can't cd back into the ancestors and >> thus can't write their tasks file, meaning, I think, that it should be >> impossible to escape ancestor constraints. > > I wish it were that clean. Unfortunately, on v1, some controllers > (memory and blkio depending on settings, netcls and netprio always) > simply aren't properly hierarchical and if you have write perm to > subdirectory you can escape the constraints of your ancestors. > Whether you can cd back up or not doesn't matter at all, so we can't > allow delegation by default. I see two options here, which would allow us to have subtree delegation without losing hierarchical structure: 1. Don't enable subtree delegation on v1 hierarchies. This would be the simplest solution, and would cut out most people from using this feature today -- but it would mean less work around trying to figure out which hierarchies are safe to delegate (we make it explicit that when you enable a cgroup on v2 that it must be safe to delegate by an unprivileged user). We also get the benefit of having the more strict cgroup.procs write rules. 2. Don't do subtree delegation on hierarchies that aren't hierarchical. This would have to be done in collaboration with the controllers (since cgroup core doesn't know which is hierarchical), and would allow all users of cgroups today to get subtree delegation. >> > Why does an unpriv NS need to have cgroup delegated to it without >> > cooperation from cgroup manager? >> >> There's actually many answers to this. The one I'm insterested in is >> the ability for applications to make use of container features without >> having to ask permission from some orchestration engine. The problem > > What's "container features"? Do you mean resource control by that? Yes. Also the device cgroup. And ignoring the container usecase, it would be useful to regular programs if they could use cgroup resource accounting as part of their regular operation. Regular processes can use rlimits -- why can't they use cgroups without needing cooperation from an admin process (which makes for security and administration issues). >> most people are looking at is how do I prevent the cgroup manager from >> running as root, because that's a security problem waiting to happen. > > It's distributing system wide resources so the top of the tree will > always be owned by root and delegating subtrees is a fairly minimal > operation. I don't see how that would necessarily lead to security > problems. If I understand correctly, the security issues James is referring to is that the cgroup manager could have a bug in it (and because the cgroup interface is the filesystem, it would probably be some kind of write-to-any-path bug). This is an intrinsic part of the model of "you need to have cooperation with an admin process in order to use resource limiting for your own processes". -- Aleksa Sarai (cyphar) www.cyphar.com -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html