On Tue, Mar 13, 2012 at 9:10 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > On Mon, Mar 12, 2012 at 04:04:16PM -0700, Tejun Heo wrote: >> On Mon, Mar 12, 2012 at 11:44:01PM +0100, Peter Zijlstra wrote: >> > On Mon, 2012-03-12 at 15:39 -0700, Tejun Heo wrote: >> > > If we can get to the point where nesting is fully >> > > supported by every controller first, that would be awesome too. >> > >> > As long as that is the goal.. otherwise, I'd be overjoyed if I can rip >> > nesting support out of the cpu-controller.. that stuff is such a pain. >> > Then again, I don't think the container people like this proposal -- >> > they were the ones pushing for full hierarchy back when. >> >> Yeah, the great pain of full hierarchy support is one of the reasons >> why I keep thinking about supporting mapping to flat hierarchy. Full >> hierarchy could be too painful and not useful enough for some >> controllers. Then again, cpu and memcg already have it and according >> to Vivek blkcg also had a proposed implementation, so maybe it's okay. >> Let's see. > > Implementing hierarchy is a pain and is expensive at run time. Supporting > flat structure will provide path for smooth transition. > > We had some RFC patches for blkcg hierarchy and that made things even more > complicated and we might not gain much. So why to complicate the code > until and unless we have a good use case. how about ditching the idea of an FS altogether? the `mkdir` creates and nests has always felt awkward to me. maybe instead we flatten everything out, and bind to the process tree, but enable a tag-like system to "mark" processes, and attach meaning to them. akin to marking+processing packets (netfilter), or maybe like sysfs tags(?). maybe a trivial example, but bear with me here ... other controllers are bound to a `name` controller ... # my pid? $ echo $$ 123 # what controllers are available for this process? $ cat /proc/self/tags/TYPE # create a new `name` base controller $ touch /proc/self/tags/admin # create a new `name` base controller $ touch /proc/self/tags/users # begin tracking cpu shares at some default level $ touch /proc/self/tags/admin.cpuacct.cpu.shares # explicit assign `admin` 150 shares $ echo 150 > /proc/self/tags/admin.cpuacct.cpu.shares # explicit assign `users` 50 shares $ echo 50 > /proc/self/tags/admin.cpuacct.cpu.shares # tag will propogate to children $ echo 1 > /proc/self/tags/admin.cpuacct.cpu.PERSISTENT # `name`'s priority relative to sibling `name` groups (like shares) $ echo 100 > /proc/self/tags/admin.cpuacct.cpu.PRIORITY # `name`'s priority relative to sibling `name` groups (like shares) $ echo 100 > /proc/self/tags/admin.cpuacct.cpu.PRIORITY [... system ...] # what controllers are available system-wide? $ cat /sys/fs/cgroup/TYPE cpuacct = monitor resources memory = monitor memory blkio = io stuffs [...] # what knobs are available? $ cat /sys/fs/cgroup/cpuacct.TYPE shares = relative assignment of resources stat = some stats [...] # how many total shares requested (system) $ cat /sys/fs/cgroup/cpuacct.cpu.shares 200 # how many total shares requested (admin) $ cat /sys/fs/cgroup/admin.cpuacct.cpu.shares 150 # how many total shares requested (users) $ cat /sys/fs/cgroup/users.cpuacct.cpu.shares 50 # *all* processes $ cat /sys/fs/cgroup/TASKS 1 123 [...] # which processes have `admin` tag? $ cat /sys/fs/cgroup/cpuacct/admin.TASKS 123 # which processes have `users` tag? $ cat /sys/fs/cgroup/cpuacct/users.TASKS 123 # link to pid $ readlink -f /sys/fs/cgroup/cpuacct/users.TASKS.123 /proc/123 # which user owns `users` tag? $ cat /sys/fs/cgroup/cpuacct/users.UID 1000 # default mode for `user` controls? $ cat /sys/fs/cgroup/users.MODE 0664 # default mode for `user` cpuacct controls? $ cat /sys/fs/cgroup/users.cpuacct.MODE 0600 # mask some controllers to `users` tag? $ echo -e "cpuacct\nmemory" > /sys/fs/cgroup/users.MASK # ... did the above work? (look at last call to TYPE above) $ cat /sys/fs/cgroup/users.TYPE blkio [...] # assign a whitelist instead $ echo -e "cpu\nmemory" > /sys/fs/cgroup/users.TYPE # mask some knobs to `users` tag $ echo -e "shares" > /sys/fs/cgroup/users.cpuacct.MASK # ... did the above work? $ cat /sys/fs/cgroup/users.cpuacct.TYPE stat = some stats [...] ... in this way there is still a sort of heirarchy, but each controller is free to choose: ) if there is any meaning to multiple `names` per process ) ... or if one one should be allowed ) how to combine laterally ) how to combine descendents ) ... maybe even assignable strategies! ) controller semantics independent of other controllers when a new pid namespace is created, the `tags` dir is "cleared out" and that person can assign new values (or maybe a directory is created in `tags`?). the effective value is the union of both, and identical to whatever the process would have had *without* a namespace (no difference, on visibility). thus, cgroupfs becomes a simple mount that has aggregate stats and system-wide settings. recap: ) bound to process heirarchy ) ... but control space is flat ) does not force every controller to use same paradigm (eg, "you must behave like a directory tree") ) ... but orthogonal multiplexing of a controller is possible if the controller allows it ) allows same permission-based ACL ) easy to see all controls affect a process or `name` group with a simple `ls -l` ) additional possibilities that didn't exist with directory/arbitrary mounts paradigm does this make sense? makes much more to me at least, and i think allow greater flexibility with less complexity (if my experience with FUSE is any indication) ... ... or is this the same wolf in sheep's skin? -- C Anthony _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers