On Wed, Sep 14, 2016 at 1:00 PM, Tejun Heo <tj@xxxxxxxxxx> wrote: > Hello, > With regard to no-internal-tasks, I see (at least) three options: 1. Keep the cgroup2 status quo. Lots of distros and such are likely to have their cgroup management fail if run in a container. I really, really dislike this option. 2. Enforce no-internal-tasks for the root cgroup. Un-cgroupable thinks will still get accounted to the root cgroup even if subtree control is on, but no tasks can be in the root cgroup if the root cgroup has subtree control on. (If some controllers removed the no-internal-tasks restriction, this would apply to the root as well.) I think this may annoy certain users. If so, and if those users are doing something valid, then I think that either those users should be strongly encouraged or even forced to changed so namespacing works for them or that we should do (3) instead. 3. Remove the no-internal-tasks restriction entirely. I can see this resulting in a lot of configuration awkwardness, but I think it will *work*, especially since all of the controllers already need to do something vaguely intelligent when subtree control is on in the root and there are tasks in the root. What I'm trying to say is that I think that option (1) is sufficiently bad that cgroup2 should do (2) or (3) instead. If option (2) is preferred and if it would break userspace, then I think we can work around it by entirely deprecating cgroup2, renaming it to cgroup3, and doing option (2) there. You've given reasons you don't like options (2) and (3). I mostly agree with those reasons, but I don't think they're strong enough to overcome the problems with (1). BTW, Mike keeps mentioning exclusive cgroups as problematic with the no-internal-tasks constraints. Do exclusive cgroups still exist in cgroup2? Could we perhaps just remove that capability entirely? I've never understood what problem exlusive cpusets and such solve that can't be more comprehensibly solved by just assigning the cpusets the normal inclusive way. >> > After a migration, the cgroup and its interface knobs are a different >> > directory and files. Semantically, during migration, we aren't moving >> > the directory or files and it'd be bizarre to overlay the semantics >> > you're describing on top of the existing cgroupfs. We will have to >> > break away from the very basic vfs rules such as a fd, once opened, >> > always corresponding to the same file. >> >> What kind of migration do you mean? Having fds follow rename(2) around is >> the normal vfs behavior, so I don't really know what you mean. > > Process or task migration by writing pid to cgroup.procs or tasks > file. cgroup never supported directory / cgroup level migrations. > Ugh. Perhaps cgroup2 should start supporting this. I think that making rename(2) work is simpler than adding a whole new API for rgroups, and I think it could solve a lot of the same problems that rgroups are trying to solve. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html