Hello, Li. On Sun, Jun 09, 2013 at 05:14:02PM +0800, Li Zefan wrote: > v2 -> v3: > Currently some cpuset behaviors are not friendly when cpuset is co-mounted > with other cgroup controllers. > > Now with this patchset if cpuset is mounted with sane_behavior option, it > behaves differently: > > - Tasks will be kept in empty cpusets when hotplug happens and take masks > of ancestors with non-empty cpus/mems, instead of being moved to an ancestor. > > - A task can be moved into an empty cpuset, and again it takes masks of > ancestors, so the user can drop a task into a newly created cgroup without > having to do anything for it. I applied 1-2 and the rest of the series also look correct to me and seem like a step in the right direction; however, I'm not quite sure this is the final interface we want. * cpus/mems_allowed changing as CPUs go up and down is nasty. There should be separation between the configured CPUs and currently available CPUs. The current behavior makes sense when coupled with the irreversible task migration and all. If we're allowing tasks to remain in empty cpusets, it only makes sense to retain and re-apply configuration as CPUs come back online. I find the original behavior of changing configurations as system state changes pretty weird especially because it's happening without any notification making it pretty difficult to use in any sort of automated way - anything which wants to wrap cpuset would have to track the configuration and CPU/nodes up/down states separately on its own, which is a very easy way to introduce incoherencies. * validate_change() rejecting updates to config if any of its descendants are using some is weird. The config change should be enforced in hierarchical manner too. If the parent drops some CPUs, it should simply drop those CPUs from the children. The same in the other direction, children having configs which aren't fully contained inside their parents is fine as long as the effective masks are correct. IOW, validate_change() doesn't really make sense if we're keeping tasks in empty cgroups. As CPUs go down and up, we'd keep the organization but lose the configuration, which is just weird. I think what we want is expanding on this patchset so that we have separate "configured" and "effective" masks, which are preferably exposed to userland and just let the config propagation deal with computing the effective masks as CPUs/nodes go down/up and config changes. The code actually could be simpler that way although there'll be complications due to the old behaviors. What do you think? If you agree, how should we proceed? We can apply these patches and build on top if you prefer. Thanks. -- tejun _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers